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1.0  SUMMARY 


In  this  work,  we  have  ereated  an  on-the-fly  hybrid  analysis  framework  that  is  easy  to  deploy 
and  ean  perform  large-seale  automatic  software  analysis  in  real-world  enterprises.  This  is  because 
our  analysis  process  requires  no  additional  software  systems.  It  harnesses  the  power  of  generic 
programming  and  high-performance  graph  processing  to  efficiently  perform  analysis  based  on 
control-flow,  data-flow,  and  point-to  information. 

We  then  explored  several  deploying  scenarios  including  leveraging  runtime  information  to  guide 
concolic  execution  by  APEX,  our  newly  developed  concolic  execution  engine  for  Android.  The 
main  motivation  for  this  activity  is  to  validate  a  hypothesis  that  malicious  code  exists  in  rarely 
executed  paths  and  exploiting  JIT  compiler  information  gives  us  the  necessary  information  to 
generate  sequences  to  target  these  paths.  Our  evaluation  results  revealed  that  despite  our  best 
efforts  to  provide  substantial  code  coverage,  about  half  of  malware  locations  in  the  engagement 
apps  were  not  exercised  by  random  and  unit  testing.  We  then  applied  our  concolic  execution 
engines  to  generate  inputs  to  these  unexecuted  locations.  However,  our  concolic  execution  engine 
still  faces  many  challenges  including  path  explosions  when  dealing  with  system  APIs  and  library 
calls.  Our  current  focus  is  to  overcome  this  issue  through  automatic  modeling  of  these  APIs. 

In  addition,  we  applied  our  framework  to  identify  potential  communication  channels  that  can  be 
used  by  colluding  applications  to  carry  out  attacks.  To  demonstrate  the  scalability  of  our  frame¬ 
work,  we  performed  large-scale  analysis  of  real-world  apps  including  Facebook  and  Spotify.  We 
also  applied  our  framework  to  simultaneously  analyze  90  apps  in  a  device  for  inter-app  connec¬ 
tions;  the  analysis  only  took  10  minutes.  This  feature  can  be  particularly  useful  for  analysts  work¬ 
ing  in  an  BYOD  environment.  Our  work  also  tackled  issues  raised  by  the  use  of  reflection  by 
dynamically  identifying  reflection  targets  and  capturing  them  for  further  analysis. 

For  future  work,  we  hope  to  continue  to  develop  JiTANA  and  APEX.  By  having  the  ability  to 
analyze  an  entire  device  instead  of  a  single  app  at  a  time,  our  framework  can  serve  as  a  foundation 
for  our  research  groups  and  other  researchers  to  develop  novel  techniques  for  analyzing  collections 
of  applications.  It  also  provides  a  direct  pathway  to  deal  with  attacks  due  to  inter-app  communi¬ 
cations.  As  of  now,  JiTANA  is  the  only  program  analysis  framework  capable  of  such  large  scale 
analysis,  and  APEx  is  the  only  concolic  execution  engine  that  applies  concolic  execution  to  gener¬ 
ate  event  sequences  (i.e.,  most  existing  concolic  execution  engines  for  Android  only  apply  concolic 
execution  starting  at  event  handlers).  We  will  release  these  frameworks  under  GPE  licenses  this 
summer.  We  would  also  like  to  continue  to  work  with  DARPA  to  further  develop  our  work  and  we 
hope  that  the  frameworks  that  we  have  created  fit  into  the  long  term  research  plan  of  DARPA. 

Also  note  that  we  have  submitted  our  work  on  JiTANA  to  International  Symposium  on  Software 
Testing  and  Analysis  (ISSTA).  The  paper  as  well  as  the  artifact  are  being  reviewed  and  we  should 
know  the  decision  on  April  19th.  We  will  submit  our  work  on  reflection  to  International  Con¬ 
ference  on  Object-Oriented  Programming,  Systems,  Languages,  and  Applications  (OOPSLA)  on 
March  23rd.  Our  work  on  APEx  will  be  submitted  to  International  Conference  on  Automated 
Software  Engineering  (ASE)  on  April  29th. 
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2.0  INTRODUCTION 


Software  ecosystems  involve  interacting  sets  of  actors  (software  components)  sitting  on  top  of  a 
common  technological  platform  that  together  provide  various  software  solutions  or  services  [21 1. 
Smart-mobile  application  platforms  such  as  Android  are  examples  of  such  ecosystems.  The  An¬ 
droid  platform  provides  software  engineers  with  IDEs  that  help  them  develop  GUIs  and  skeleton 
code,  and  a  powerful  application  framework  that  they  can  use  to  quickly  create  Java  applications 
(apps)  that  run  on  Android  devices.  In  this  way,  Google,  as  the  Android  provider,  can  collaborate 
with  external  developers  to  build  apps  with  different  functionalities,  which  in  turn  increases  the 
value  of  Android.  As  such.  Android  is  currently  the  most  widely  adopted  smart-mobile  platform 
in  the  world. 

Unfortunately,  Android  is  also  a  frequent  target  of  malware  authors.  As  of  January,  2014,  there 
were  roughly  10  million  known  malicious  Android  apps  [[T4|  and  this  number  continues  to  increase. 
During  just  the  first  three  months  of  2015  there  were  nearly  5,000  new  malicious  apps  created  each 
day  for  Android  Q .  As  the  number  of  known  malware  increases  so  does  their  complexity  and 
sophistication,  rendering  them  very  difficult  to  detect. 

To  ensure  the  dependability  and  security  of  software  ecosystems  such  as  Android,  engineers  and 
security  analysts  need  to  be  able  to  isolate  faults  and  security  vulnerabilities  in  those  ecosystems. 
Ideally,  faults  and  vulnerabilities  in  an  app  should  be  detected  prior  to  its  deployment.  As  such, 
software  engineers  and  security  analysts  use  various  software  assurance  processes  in  an  attempt  to 
detect  and  remove  these  as  part  of  the  software  development  process. 

There  are  many  examples,  however,  of  flaws  in  current  software  assurance  processes.  First, 
many  Android  apps  suffer  from  faults  and  vulnerabilities  due  to  interactions  between  apps  and 
framework  components.  A  recent  study  notes  that  23%  of  Android  apps  behave  differently  after 
a  platform  update  [fT0l|.  This  is  because  many  Android  apps  depend  on  other  software  to  operate 
(e.g.,  they  rely  on  social  media  apps  for  information  sharing)  [[T0|.  In  fact,  it  has  been  reported 
that  about  50%  of  Android  updates  have  caused  previously  working  apps  to  fail  or  render  systems 
unstable  [10,11  30  j).  There  have  also  been  increasing  occurrences  of  collusion  attacks,  in  which 
multiple  apps  work  together  to  perform  malicious  acts  [|6  19 j.  Program  analysis  techniques  that 
focus  on  single  apps  are  not  effective  for  detecting  faults  and  vulnerabilities  that  involve  interac¬ 
tions  among  multiple  software  components  or  multiple  apps.  As  such,  these  elude  testing  and  are 
discovered  after  system  deployment,  increasing  both  costs  and  the  overall  attack  surface  of  the 
software  system.  Clearly,  approaches  that  allow  engineers  and  analysts  to  cost-effectively  ana¬ 
lyze  interactions  among  software  components  would  allow  more  faults  and  vulnerabilities  to  be 
detected  during  the  software  development  process. 

As  a  second  example,  there  have  been  reports  of  malicious  apps  that  escaped  vetting  by  Google 
and  have  been  admitted  to  its  Play  store  Google  introduced  Bouncer,  a  black-box 


dynamic  analysis  system  that  tests  for  malicious  behaviors  in  submitted  apps  [23,26|.  This  par¬ 
ticular  approach  requires  test  inputs  that  exercise  most,  if  not  all,  entry  points  to  a  program  and 
its  critical  paths  in  order  to  be  effective  [|^|.  Unfortunately,  having  powerful  test  suites  alone 
is  not  sufficient  for  revealing  complex  vulnerabilities  because  many  vulnerabilities  can  be  exer¬ 
cised  only  with  specific  inputs  or  input  sequences.  Therefore,  being  unable  to  observe  program 
execution  and  generate  appropriate  event  sequences  greatly  limits  the  vulnerability  detection  ef¬ 
fectiveness  of  Bouncer.  As  recently  as  January,  2016,  malicious  apps  had  been  missed  by  Bouncer 
and  been  distributed  through  the  Play  store  [[T|.  Clearly,  an  hybrid  analysis  approach  that  can  per- 
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form  on-the-fly  code  coverage  measurement,  provide  real-time  feedback,  and  generate  necessary 
event  sequences  to  further  enhance  code  coverage  could  increase  the  effectiveness  of  Bouncer  by 
targeting  parts  of  an  app  that  have  not  been  adequately  exercised. 

Recent  adoption  of  Bring  Your  Own  Device  (BYOD)  approaches  has  created  an  even  greater 
need  for  security  analysts  to  be  able  to  vet  devices  efficiently  and  effectively.  In  effect,  each  app 
on  a  device  must  be  vetted.  Existing  program  analysis  tools  are  not  capable  of  doing  this  cost- 
effectively,  primarily  due  to  reliance  on  techniques  that  analyze  single  apps  in  isolation  instead 
of  analyzing  multiple  apps  in  unison.  Such  core  analysis  engines  cause  the  frameworks  in  which 
they  are  included  to  be  non-scalable,  inefficient,  and  ineffective.  In  this  case,  an  approach  that  can 
simultaneously  analyze  all  apps  on  a  device  efficiently  can  help  analysts  detect  malicious  apps  on 
the  device  more  effectively. 

In  this  project,  we  have  made  the  following  three  contributions  that  advance  the  state-of-the-art 
in  both  static  and  dynamic  program  analysis. 


1.  We  introduce  JIT-Analysis  (JiTANA),  a  new  framework  to  support  static  and  dynamic  pro¬ 
gram  analysis  techniques  aimed  to  vet  Android  applications  for  the  presence  of  software 
defects,  security  vulnerabilities  and  malicious  intents.  As  shown  in  Section  [5!0l  the  pro¬ 
posed  framework  is  highly  scalable  and  capable  of  focusing  its  analysis  efforts  on  more 
fruitful  program  execution  paths.  We  implemented  our  framework  as  part  of  Dalvik,  the 
virtual  machine  used  in  Android,  so  that  it  can  exploit  runtime  information  (e.g.,  dynamic 
compilation  information)  readily  available  inside  the  virtual  machine  without  the  need  of 
additional  software  systems. 


2.  We  introduce  Android  Path  Explorer  (APEx),  a  concolic  execution  engine  for  Android  that 
is  capable  of  creating  event  sequences  and  input  values  that  can  be  used  by  a  program  to 
reach  specific  paths.  As  has  been  reported  by  event-based  testing  researchers,  generating 
test  cases  with  adequate  coverage  is  a  challenging  problem  0 .  As  such,  instead  of  relying 
on  test  inputs  to  explore  execution  paths,  our  approach  track  branches  off  of  commonly 
executed  paths  that  have  not  been  explored.  It  then  computes  event  sequences  and  input 
values  that  can  cause  the  execution  of  those  paths  and  analyze  the  recorded  traces.  Our 
proposed  concolic  execution  engine  serves  two  purposes.  Eirst,  it  allows  us  to  remove  paths 
that  are  not  feasible  due  to  the  absence  of  inputs.  Second,  it  allows  our  approach  to  explore 
more  paths  than  those  exercised  by  the  test  cases. 


3.  We  illustrate  how  the  propose  frameworks  can  be  used  to  tackle  emerging  security  chal¬ 
lenges.  We  have  developed  a  set  of  analyses  for  statically  and  dynamically  detectable  be¬ 
haviors  such  as  Inter-Application  communications  and  code  coverage.  The  code  coverage 
information  provides  analysts  with  “suspicious  paths”,  which  are  rarely  executed  paths.  The 
results  are  then  used  to  guide  APEX,  our  concolic  execution  engine  to  further  explore  and 
compute  concrete  values  to  possibly  exercise  these  suspicious  paths.  We  create  a  visualiza¬ 
tion  engine  that  provide  real-time  feedback  of  an  on-going  analysis.  We  show  that  JiTANA 
is  highly  scalable  by  using  it  to  simultaneously  analyze  all  applications  in  Android  devices 
and  how  it  can  cope  with  runtime  dynamics  by  analyzing  dynamically  loaded  code. 
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Figure  1:  Architecture  of  JiTANA 

3.0  METHODS,  ASSUMPTIONS,  AND  PROCEDURES 

In  this  section,  we  describe  the  designs  and  implementations  of  our  two  major  components: 
JiTANA  and  APEX. 

3.1  The  JiTANA  Framework 


Figure  provides  an  architectural  view  of  the  JiTANA  framework.  We  designed  JiTANA  to 
be  a  highly  efficient  hybrid  program  analysis  framework,  so  it  needs  to  be  able  to  interface  with 
language  virtual  machines  such  as  Dalvik  or  the  Android  Runtime  System  (ART).  (Our  current 
implementation  supports  only  Dalvik.)  The  interface  with  Dalvik  is  provided  through  the  Analysis 
Controller,  which  is  connected  to  Dalvik  via  Java  Debug  Wire  Protocol  (JDWP)  over  Android 
Debug  Bridge  (ADB).  This  connection  is  established  primarily  for  use  in  dynamic  analyses,  though 
it  also  assists  with  static  analyses  in  cases  in  which  code  is  dynamically  generated  during  program 
initialization  (e.g.,  the  Facebook  app  uses  this  mechanism  during  app  initialization). 

The  next  major  component  in  the  framework  is  the  Class  Loader  VM  (CLVM).  This  provides  the 
mechanism  used  to  load  classes  in  an  APK  along  with  system  related  and  dynamically  generated 


classes.  It  is  constructed  based  on  the  Java  Fanguage  Specification  p0[|.  Once  classes  are  loaded, 
the  system  generates  a  set  of  Boost  Graph  Library  compliant  VM  graphs  that  include  class  loader 
graphs,  class  graphs,  method  graphs,  instruction  graphs,  and  field  graphs.  Various  Analysis  En¬ 
gines  then  process  these  to  produce  control-flow,  data-flow  and  points-to  information,  which  is 
then  fed  back  in  to  the  VM  graphs.  Other  information  is  used  to  construct  Ana/y^ A  Graphs  such  as 
pointer  assignment  graphs,  context-sensitive  call  graphs,  and  an  lAC  graph.  The  framework  can  be 
integrated  with  visualization  tools  such  as  TraV IS  or  run  side-by-side  with  existing  visualization 
tools  such  as  Graphviz. 

Next  we  describe  the  design  rationales  behind  and  implementation  details  of  major  components 
in  the  JiTANA  library. 


3.1.1  Design  Rationales 

To  meet  our  performance  and  design  goals,  we  implemented  JiTANA  in  C-i-i-14  instead  of  an  object- 
oriented  language  such  as  Java.  We  made  this  choice  for  several  reasons: 
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•  JiTANA  is  designed  to  work  with  a  virtual  maehine  on  a  deviee  in  various  eonfigurations. 
It  ean  run  on  a  deviee  or  workstation  as  a  stand-alone  applieation  eommunieating  with  a 
virtual  maehine  in  real-time,  using  an  inter-proeess  eommunieation  meehanism  or  a  network 
protoeol  sueh  as  the  Java  Debug  Wire  Protoeol.  It  ean  even  be  embedded  within  a  virtual 
maehine  as  a  library.  Requiring  the  use  of  a  Java  virtual  maehine  in  partieular  would  hinder 
these  use  eases. 

•  JiTANA  analyses  may  require  real-time  eommunieation  with  native  applieations  and  the  op¬ 
erating  system  kernel.  Aehieving  this  on  a  virtual  maehine  is  diffieult,  if  not  impossible, 
without  using  the  Java  Native  Interfaee;  however,  that  would  be  an  expensive  bridging  ap- 
proaeh. 

•  C-i-i-  supports  the  use  of  the  generie  programming  paradigm  with  templates.  This  paradigm 
separates  algorithms  and  data  struetures  by  defining  what  is  ealled  a  concept,  a  deseription 
of  both  syntaetie  and  semantie  requirements  for  one  or  more  types  p8|.  An  algorithm  op¬ 
erating  on  a  eoneept  needs  to  be  implemented  only  onee,  the  same  implementation  ean  then 
be  reused  for  any  eonerete  type  that  is  a  model  of  the  eoneept.  The  C-i-i-  template  instanti¬ 
ation  meehanism  eoupled  with  eompiler  optimizations  has  been  shown  to  generate  eode  as 
effieient  as  hand-tuned  FORTRAN 

To  elaborate  on  this  last  point,  the  use  of  concepts  in  generie  programming  differs  from  the  use 
of  traditional  objeet-oriented  teehniques  in  statie  analysis  tools  sueh  as  SoOT  and  LLVM.  We  illus¬ 
trate  the  praetieal  differenee  between  generie  programming  (GP)  and  objeet-oriented  programming 
(OOP)  through  a  simple  algorithm  max,  whieh  returns  the  larger  of  two  values. 

template  <typename  T>  requires  TotallyOrdered<T> ( ) 
inline  T&  max(T&  x,  T&  y)  { 
reutrn  (x  <  y)  ?  y  :  x; 

} 

Listing  1:  Generic  Algorithm  max  in  C++  with  Concepts 


public  static  Comparable  max (Comparable  x.  Comparable  y)  { 
reutrn  (y . compareTo (x)  <  0)  ?  y  :  x; 

} 

Listing  2:  Algorithm  max  in  Java  without  Generics 

In  GP,  the  algorithm  ean  be  implemented  as  shown  in  Listing[2  This  implementation  ean  operate 
on  any  type  as  long  as  it  models  the  TotallyOrdered  eoneept,  whieh  requires  operators  <,  >, 
<,  and  >  to  be  defined  with  the  following  semantie  eonstraints: 

a  >  b  ^  b  <  a,  a  <  b  ^  -i(6  <  a),  a  >  b  ^  -i(a  <  b) . 

In  OOP,  the  same  algorithm  ean  be  implemented  as  shown  in  Listingj^  It  is  less  reusable  than  the 
GP  version  beeause  it  requires  the  type  to  be  inherited  from  a  speeifie  type  named  Comparable. 

As  shown  in  Figure  JiTANA  separates  data  struetures  and  algorithms.  It  represents  programs 
as  well-defined  “Graph  Data  Struetures”,  and  all  “Analysis  Engines”  work  on  these.  The  eore 
analysis  algorithms  are  reusable  and  flexible  beeause  they  are  defined  on  eoneepts  rather  than 
eonerete  types.  The  data  struetures  are  also  defined  to  be  similar  to  those  used  in  the  aetual  virtual 
maehine  to  reduee  the  overhead  of  exehanging  dynamie  information. 
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3.1.2  Graphs 


Most  of  the  data  structures  used  in  Jitana  are  represented  as  graphs.  Typically,  a  node  in  such 
graphs  represents  a  virtual  machine  object  (e.g.,  a  class,  a  method,  an  instruction)  together  with 
analysis  information  (e.g.,  execution  counts),  while  an  edge  represents  a  relationship  between  two 
nodes  (e.g.,  inheritance,  control-flow,  data-flow). 

Every  graph  used  in  JiTANA  models  appropriate  graph  concepts  defined  in  the  Boost  Graph 
Library  (BGL)j^a  de  facto  generic  C-i-i-  graph  library.  This  means  that  implementations  of  highly 
optimized  generic  graph  algorithms  are  already  available  for  use  by  applications  on  the  graphs 
defined  in  JiTANA  without  modifications.  It  also  means  that  new  algorithms  can  be  implemented 
for  a  concept,  rather  than  for  a  specific  type,  so  that  they  can  be  used  with  any  types  modeling 
the  same  concept.  Analyses  can  also  be  performed  on  any  machines  on  which  the  BGL  library  is 
installed. 


Table  1.  Types  of  Handles  in  Jitana 


DEX  Handle 

JVM  Handle 

Class  Loader 

struct  class-loader_hdl  { 
uint8_t  idx; 

}; 

DEX  File 

struct  dex_file_hdl  { 

class_loader_hdl  loader_hdl ; 
uint8_t  idx; 

}; 

N/A 

Type 

struct  dex_type_hdl  { 

dex_file_hdl  file_hdl; 
uintl6_t  idx; 

}; 

struct  jvm_type_hdl  { 

class_loader_hdl  loader_hdl ; 
std:: string  descriptor; 

}; 

Method 

struct  dex_method_hdl  { 

dex_file_hdl  file_hdl; 
uintl6_t  idx; 

}; 

struct  j vm_method_hdl  { 

j vm_type_hdl  type_hdl ; 
std: : string  unique_name; 

}; 

Field 

struct  dex_f ield_hdl  { 
dex_f ile_hdl  file; 
uintl6_t  idx; 

}; 

struct  j vm_f ield_hdl  { 

j vm_type_hdl  type_hdl ; 
std: : string  unique_name; 

}; 

In  contrast,  most  existing  tools  do  not  use  explicit  graph  types  for  their  data  structures.  For  ex¬ 
ample,  Soot  and  LLVM  follow  the  traditional  object-oriented  approach:  an  object  holds  pointers 
to  other  objects  to  imply  relationships  that  implicitly  form  a  graph  data  structure.  This  means  that 
algorithms  implemented  in  these  platforms  are  strongly  tied  to  a  tool’s  design  details,  not  to  a  graph 
concept.  As  a  consequence,  even  a  simple  algorithm  such  as  a  depth-first  search  algorithm  must 
be  implemented  each  time  it  is  needed,  and  when  a  new  tool  comes  along,  a  library  of  algorithms 
must  be  rewritten. 

A  handle  is  used  to  identify  a  virtual  machine  object.  Table  lists  the  handle  types  most  fre¬ 
quently  used  in  JiTANA.  A  handle  is  small  in  size,  but  unlike  pointers  it  does  not  change  values 
over  different  executions;  this  allows  us  to  treat  handles  as  statically  unique  identifiers.  The  use  of 
handles  allows  the  graphs  generated  by  Jitana  to  be  persisted  and  reused.  These  graphs  can  also 
be  replicated  for  parallel  analysis  on  computing  clusters. 

Tablej^lists  some  of  the  graph  types  used  in  Jitana.  There  are  two  categories  of  graphs:  virtual 
machine  (VM)  graphs  and  analysis  graphs.  Virtual  machine  graphs  are  graphs  that  closely  reflect 

'^http  :  / /www. boost.org/ doo/libs/develop/libs/graph/ doc/ 
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Table  2.  Jitana  Graphs 


Name 

Type 

Node 

Edge 

Class  Loader 

Graph 

VM  Graph 

Class  Loader 

Parent  Loader 

Class  Graph 

VM  Graph 

Class 

Inheritance 

Method  Graph 

VM  Graph 

Method 

Inheritance, 

Invocation 

Field  Graph 

VM  Graph 

Field 

Instruction  Graph 

VM  Graph 

Instruction 

Control  Flow, 
Data  Flow 

Pointer  Assignment 
Graph 

Analysis  Graph 

Register,  Alloc  Site, 
Field/ Array  RD/WR 

Assignment 

Context-Sensitive 
Call  Graph 

Analysis  Graph 

Method  with 

Callsite 

Invocation 

Inter- Application 

Communication 

Graph 

Analysis  Graph 

Class  Loader, 
Resource 

Information 

Flow 

the  structure  of  Java  virtual  machines.  A  node  in  a  virtual  machine  graph  represents  a  virtual 
machine  object  (e.g.,  class,  method)  that  can  be  created  or  removed  only  by  the  CLVM  module 
in  Jitana  (described  in  Section  [3.1  .SI).  Modification  of  a  node  property  by  an  analysis  engine 
is  allowed  and  is  one  of  the  primary  ways  to  track  dynamic  information  such  as  code  coverage. 
The  edge  type  is  erased  with  the  Boost. TypeErasure  librarjj^so  that  analysis  engines  can 
add  edges  of  any  type.  Examples  of  these  graphs  rendered  with  Graphviz,  an  open-source  graph 
visualization  tool,  are  shown  in  Figure]^ 

Figure  [2(a)] displays  a  class  loader  graph  for  a  case  in  which  JiTANA  analyzes  four  applications 
simultaneously.  Each  class  loader  is  assigned  a  unique  ID  (integers  in  the  upper  left  comers)  so  that 
classes  with  a  same  name  from  different  applications  can  be  distinguished.  For  example,  both  Face- 
book  and  Instagram  ship  a  class  named  Landroid/support /v4 /app/Fragment  ;|^  with 
different  method  signatures  (i.e.,  different  implementations)  because  the  Facebook  app  is  obfus¬ 
cated  with  ProGuard]^  Class  Loader  0  is  the  system  class  loader  and  is  used  to  load  important 
system  classes.  Each  directed  edge  shows  the  parent/child  relationship  between  two  class  loaders 
(e.g.,  the  system  class  loader  spawns  off  application  class  loaders). 

Figure  [2(b)] displays  a  class  graph  that  shows  relationships  between  four  classes;  the  directed 
edges  display  subclass  relationships  (e.g.,  Lcom/instagram/.../LoadImageTask;  is  a  sub¬ 
class  of  the  abstract  class  Landroid/os/AsyncTask; ). 

Figure  |2(c)|  displays  a  method  graph  that  shows  relationships  among  several  methods  within  a 
set  of  analyzed  applications.  Nodes  represent  methods,  and  edges  indicate  whether  method  calls 
are  direct  or  virtual.  The  numbers  in  the  upper  left  comers  of  the  nodes  indicate  the  applications 
to  which  the  methods  belong. 

Figure  |2(d)|  illustrates  an  instruction  graph  for  a  method.  It  includes  both  control-flow  (solid 
edges)  and  data-flow  (dotted  edges)  information.  The  data-flow  information  is  derived  via  reacha¬ 
bility  analysis  performed  on  virtual  registers. 

Not  depicted  in  the  figure  is  di  field  graph.  Jitana  stores  a  list  of  fields  as  nodes,  but  by  default 
it  does  not  add  edges  to  this  graph.  This  data  can  still  be  used  for  analysis  purposes. 

Analysis  graphs  are  used  by  analysis  engines  to  represent  relationships  that  cannot  be  expressed 

^  http  :  /  /  WWW. boo  st.org/ doc /libs /develop/ doc /html /boo  st_typeerasure.html 

class  name  in  a  JVM  starts  with  ‘L’  and  end  with  a  semicolon 

^  https  :  /  /  WWW. facebook.com/ notes /facebook- engineering/  under- the-hood- da  Ivik-patch-  for-facebook- 
for-android/10151345597798920 
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1  1  SuperDepth 

.‘!uper_depth_clas.‘!es.dex 

2  1  Facebook 

conscrypt.dex 

okhitp.dex 

core-junit.dex 

android.testjunner.dex 

android.policy.dex 


aataiB'appwcom.raceDOOK.Kacana-i  .apxsciasses.aex 
program-eb5202dbb54c0efff  1  cO  1  c5f , .  .baa6  .dex  .dex 
program-e2ca9fdbaa4e32f97b90b376...baa6.dex.dex 
program-7feaf7c75a5305bI083al60f...baa6.dex.dex 


(a)  Class  Loader  Graph 


(b)  Class  Graph  (Subgraph) 


l_0_m244 

Ljp/biolOO/android/superdepth/GameBase; 

sgn(I)I 


0:  ENTRY  vl-v2 


9:  EXIT  vR 


(c)  Method  Graph  (Subgraph) 


(d)  Instruction  Graph  for 

GameBase .  sgn  (int)  in  SuperDepth 


Figure  2:  Illustrations  of  Various  VM  Graphs 
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by  virtual  machine  graphs.  For  example,  the  points-to  analysis  engine  (deseribed  later  on)  defines  a 
pointer  assignment  graph  with  multiple  speeial  node  types  to  represent  the  flow  of  pointer  values. 
The  CLVM  module  does  not  read  or  write  to  these  graphs.  As  sueh,  Jitana  does  not  enforce 
any  requirements  on  analysis  graphs,  but  all  analysis  graphs  defined  within  JiTANA  model  a  set 
of  appropriate  graph  eoneepts  defined  by  the  Boost  Graph  Library  so  that  existing  generie  graph 
algorithm  implementations  ean  be  used  on  these  graphs. 


3.1.3  Class  Loader  Virtual  Machine  (CLVM) 


Based  on  the  Java  Virtual  Maehine  Speeifieations  p0[|,  a  elass  must  be  loaded  by  a  elass  loader. 
A  elass  loader  is  a  Java  elass  inherited  from  an  abstraet  elass  L  java/lang/ClassLoader; ; 
it  uses  a  delegation  model  to  seareh  for  elasses.  Eaeh  instanee  of  ClassLoader  has  a  referenee 
to  a  parent  elass  loader.  When  a  elass  loader  eannot  find  a  elass  it  needs  to  load,  it  delegates  the 
task  to  its  parent  elass  loader.  Both  the  virtual  maehine  and  the  Java  eode  running  on  it  partieipate 
in  this  proeess.  In  the  Dalvik  virtual  maehine,  this  process  oeeurs  as  shown  in  Algorithm  Note 
that  a  elass  must  be  loaded  from  a  DEX  file  on  the  file  system  in  the  Dalvik  virtual  machine. 


Data:  N:  name  of  the  elass  to  be  loaded,  Linn:  initiating  elass  loader,  and  an  ordered  set 
of  DEX  files  for  a  elass  loader  L. 

Result:  (Ldef,  C):  a  pair  of  defining  elass  loader  and  pointer  to  a  elass  or  interfaee  loaded, 
begin 

L  ^  Tjnit. 

C  i —  null; 
do 

foreach  D  G  D^  do 

if  A  G  class  definitions  list  of  D  then 
C  i —  address  of  loaded  class; 

return  (L,  C)\ 
end 
end 

L  i —  parent  loader  of  L; 

while  L  7^  null; 
return  (L,  C); 
end 

Algorithm  1:  Class  Loading  Algorithm 


In  Jitana,  all  pointers  to  elasses  are  replaced  by  edges,  rendering  relationships  explicit.  As  a 
result,  the  implementation  of  Algorithm  is  merely  an  instantiation  of  the  depth-first  visit  algo¬ 
rithm  provided  by  the  BGL,  requiring  the  implementation  only  of  the  aetual  loading  of  a  elass  from 
a  DEX  file. 


3.1.4  Analysis  Engines 

A  primary  difference  between  Jitana  and  SoOT  is  that  Jitana  is  built  to  be  efficient  at  both  static 
and  dynamic  analysis.  As  such,  common  analysis  building  blocks  such  as  control-flow,  data-flow, 
and  points-to  graphs  can  be  annotated  on  the  fly  based  on  incoming  runtime  events.  Next,  we 
describe  the  approaches  used  to  construct  Jitana’s  analysis  engines. 
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Control-Flow  Analysis.  The  intraprocedural  control-flow  edges  in  instruction  graphs  (see  Fig¬ 
ure  [2(^  are  created  by  the  DEX  file  parser  as  it  creates  instruction  nodes.  Branch  and  jump  targets 


are  simply  encoded  as  offsets  from  the  DEX  instruction.  The  absence  of  indirect  addressing  mode 
for  jumps  in  the  DEX  instruction  set  renders  intraprocedural  control-flow  analysis  trivial. 

Our  interprocedural  control-flow  analysis  includes  both  direct  call  edges  and  virtual  call  edges 
in  a  method  graph  as  shown  in  Eigure[2(^  However,  the  actual  target  of  a  virtual  call  edge  cannot 
be  accurately  computed  without  consulting  the  virtual  dispatch  tables  (ytables)  that  are  used  to 
support  late-binding  features  such  as  inheritance.  As  such,  our  analysis  is  not  sound  because  it 
ignores  the  vtables.  Eurthermore,  our  analysis  is  incomplete  because  it  ignores  reflection.  These 
two  issues  are  common  among  static  program  analysis  tools  for  Java. 

To  improve  the  soundness  and  the  completeness  of  its  analysis,  Jitana  applies  additional  static 
analyses  such  as  points-to  analysis  to  determine  the  actual  type  of  a  method’s  receiver.  It  can 
also  incorporate  dynamic  execution  information  from  the  virtual  machine  on  the  device  to  identify 
reflection  targets  and  annotate  graphs  on  the  fly. 


Data-Flow  Analysis.  Jitana  provides  a  few  data-flow  analysis  engines.  One  common  analysis 
supported  is  reaching  definitions  analysis,  used  to  generate  def-use  pairs  of  registers  in  the  instruc¬ 
tion  graph  as  shown  in  Figure [2(dj] (dotted  edges).  The  monotone  data-flow  algorithm  used  in  the 
reaching  definitions  algorithm  is  implemented  as  a  generic  function,  and  therefore,  can  be  used  to 
generate  different  types  of  data-flow  analyses  such  as  available  expressions  or  live  variable  analysis 
simply  by  defining  appropriate  functors.  It  also  works  on  any  graph  types  that  model  the  concepts 
required  for  the  control-flow  graphs. 

In  addition  to  static  data-flow  analyses,  JiTANA  can  incorporate  information  from  the  virtual 
machine.  For  example,  a  dynamic  taint  analysis  can  be  performed  on  the  virtual  machine  to  track 
data  flows  from  sources  to  sinks.  The  results  of  this  analysis  can  be  rendered  as  edges  on  a  static 
data-flow  graph  to  provide  a  more  meaningful  view  of  the  flow  of  data. 


Points-to  Analysis.  In  Java,  most  function  calls  are  made  using  a  dynamic  dispatch  mechanism. 
Therefore,  knowing  the  actual  type  of  an  object  in  a  pointer  variable  is  essential  for  any  interpro¬ 
cedural  analysis.  JiTANA  provides  a  points-to  analysis  engine.  The  algorithm  is  similar  to  the  one 
used  by  Spark  |jT^,  a  points-to  analysis  framework  in  SoOT,  with  the  following  differences: 


•  It  uses  register  def-use  information  from  the  data-flow  analysis  engine  to  add  flow  sensitivity. 
This  improves  the  precision  of  the  analysis,  especially  because  the  same  register  may  be 
reused  within  a  method  in  the  Dalvik  architecture. 


•  It  operates  on  a  pair  of  graphs:  a  pointer  assignment  graph  (PAG)  and  a  context-sensitive 
call  graph  (CSCG).  The  PAG  is  conceptually  the  same  as  the  one  used  in  Spark,  except  it 
is  defined  as  a  BGE  graph  in  order  to  use  existing  generic  algorithm  implementations.  The 
CSCG  is  a  call  graph  specific  to  a  given  set  of  entry  points.  These  graphs  are  provided  as 
input  to  the  analysis  along  with  entry  point  information.  It  is  common  to  have  multiple  entry 
points  executed  in  sequence  in  event-based  Android  applications;  in  these  cases,  the  points- 
to  analysis  may  be  called  multiple  times  for  each  entry  point  on  the  same  pair  of  graphs. 


3.1.5  Virtual  Machine  Modifications 

Jitana  and  the  Dalvik  VM  are  connected  using  the  Java  Debug  Wire  Protocol  (JDWP)  over  the 
Android  Debug  Bridge  (ADB).  The  JDWP  is  a  standard  protocol  for  attaching  a  debugger  to  a 
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virtual  machine.  With  its  pre-defined  commands,  we  can  observe  and  control  program  execution. 

These  pre-defined  commands,  however,  are  not  sufficient  for  all  analyses.  For  example,  we  need 
to  add  a  new  command  to  retrieve  code  coverage  information.  With  our  modified  virtual  machine, 
the  number  of  executions  for  all  basic  blocks  of  the  non-system  DEX  instructions  are  counted 
automatically  in  interpreter  mode;  the  new  command  dumps  the  delta  of  the  execution  counters. 

This  particular  modification  to  the  virtual  machine  is  minimal:  127  lines  of  C-i-i-  and  66  lines 
of  ARM  assembly  code  were  added.  The  C-i-i-  code  handles  the  additional  JDWP  communication. 
It  also  allocates  the  same  amount  of  virtual  memory  pages  for  the  counters  when  a  DEX  file  is 
mapped  to  the  memory  and  records  the  offsets  between  them.  The  ARM  assembly  code  added 
to  the  interpreter  increments  the  counter  whenever  a  jump  or  branch  instruction  is  executed.  The 
address  of  the  counter  is  given  by  adding  the  address  of  the  DEX  instruction  and  the  offset  to  the 
counter  pages. 

The  code  generated  by  the  just-in-time  compiler  for  hot  traces  remains  unmodified;  this  renders 
the  overhead  of  the  modifications  unnoticeable  to  the  user.  The  Dalvik  VM  executes  hot  traces 
in  interpreter  mode  in  the  entry  even  if  their  compiled  code  is  on  the  code  cache,  so  we  can  still 
obtain  correct  code  coverage  data  and  note  the  relative  hotness  of  a  trace. 

3.2  The  APEx  Framework 

The  APEx  framework  combines  gray-box  GUI  testing  with  concolic  execution  to  find  valid  event 
sequences  for  specific  target  locations  in  the  code.  The  overview  of  APEx  can  be  seen  in  Eigure|^ 
The  first  step  of  the  input  generation  process  is  a  depth  first  GUI  traversal  that  dynamically  builds  a 
GUI  model  and  event-handler  map.  We  then  analyze  previously  uncovered  program  paths,  identify 
important  paths,  then  use  concolic  execution  to  generate  event  sequences  to  those  paths. 

3.2.1  GUI  Exploration 

Our  GUI  exploration  uses  a  depth  first  strategy  to  traverse  GUI  layouts  and  exercise  relevant  events 
in  each  layout  until  all  the  layouts  and  events  are  explored.  The  work  flow  of  GUI  traversal  is  shown 
in  Eigurej^  Events  are  generated  in  a  gray -box  approach.  AndroidManifest.xml  can  provide  infor¬ 
mation  such  as  package  name,  activity  class  names,  MainActivity  name,  and  intent  filter  that  can  be 
used  to  perform  automatic  GUI  traversal.  The  GUI  traversal  process  is  built  on  the  UIAutomator 
program  in  Android  SDK.  The  UIAutomator  can  take  screen  shots  and  dump  current  layout  hierar¬ 
chy  of  an  Android  device  at  any  given  time.  By  checking  layout  hierarchy  and  applying  events  in 
a  lock-step  manner,  the  GUI  traversal  process  keeps  exploring  new  layouts  and  record  the  layout 
transitions  until  all  layouts  and  all  events  are  explored.  The  GUI  traversal  process  generates  two 
important  data:  the  GUI  model,  and  the  Event-Handler  map. 

GUI  model  contains  all  the  layouts  and  events  explored  during  the  traversal.  Each  layout  infor¬ 
mation  is  stored  as  a  node,  and  each  event  is  stored  as  a  directed  edge  that  starts  from  the  layout 
node  before  applying  the  event  and  ends  at  the  layout  node  after  applying  the  event.  The  layout 
hierarchy  information  retrieved  from  UIAutomator  contains  various  attributes  of  all  the  View  ob¬ 
jects  that  are  showing  on  the  device  screen.  Such  information  is  used  to:  (i)  create  layout  nodes, 
(ii)  find  applicable  events  for  each  View  object,  and  (iii)  compare  layouts  before  and  after  apply¬ 
ing  an  event.  The  GUI  model  is  generated  in  a  depth-first  manner.  The  traversal  keeps  exploring 
layouts  and  events  until  arriving  an  layout  where  all  the  events  keep  the  same  layout.  Then  the 
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Figure  3:  Overview  of  APEx  Framework 

app  restarts  and  revisit  the  most  recent  layout  that  still  has  unexplored  events,  and  continues  the 
traversal  process.  The  GUI  traversal  finishes  when  there  is  no  unexplored  events  in  any  layout  in 
the  GUI  model.  It  is  worth  noting  that  the  GUI  model  generated  from  this  traversal  stage  might 
be  incomplete  due  to  the  dynamic  nature  of  our  approach.  It  is  possible  that  certain  GUI  layouts 
can  only  be  triggered  by  certain  complex  event  sequences.  Such  GUI  layout  will  be  missing  in 
our  GUI  model.  We  compensate  this  possible  drawback  in  the  later  stage  by  identifying  uncovered 
GUI  transition  statements  and  generating  event  sequences  to  trigger  the  GUI  transition. 

Event-Handler  map  pairs  an  event  to  an  event  handler  method  (e.g.,  Buttonl  to  onClickl()). 
Instrumentation  is  used  to  monitor  and  capture  the  mapping  during  GUI  traversal.  Any  method 
in  the  app  that  has  the  signature  of  an  event  handler  is  instrumented  to  print  out  a  message  when 
it  starts  executing  and  when  it  returns.  When  an  event  is  applied  during  the  GUI  traversal,  the 
corresponding  event  handler  method  information  is  printed  out  to  system  console.  Usually  this 
event  handler  registration  information  can  be  obtained  statically  from  the  layout  XML  files  or  the 
Java  code.  However  there  exist  certain  cases  where  this  information  cannot  be  easily  obtained 
statically.  For  example.  View  objects  can  be  defined  and  created  during  runtime  rather  than  being 
predefined  in  the  XML  or  Java  code,  it  is  possible  that  rather  complicated  static  analysis  is  needed 
to  determine  their  life  cycles  and  which  layouts  these  View  objects  belong  to.  Therefore  in  our 
GUI  traversal  process,  the  events  and  their  registered  event  handler  methods  are  all  discovered 
dynamically. 

3.2.2  Symbolic  Execution  on  Dalvik  Bytecode 

Based  on  the  Event-Handler  map  provided  by  GUI  traversal,  symbolic  execution  is  performed  on 
each  event  handler  method.  Symbolic  execution  tries  to  traverse  all  the  execution  paths  caused  by 
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Figure  4:  GUI  Traversal  Workflow 

branch  statements  such  as  if  and  switch.  The  GUI  traversal  process  has  already  concretely  executed 
one  path  for  each  event  handler  method.  For  event  handler  methods  that  have  multiple  execution 
paths,  the  rest  of  the  execution  paths  are  executed  symbolically.  As  a  result,  a  Path  Summary  is 
generated  for  each  execution  path. 

Path  Summary  contains  3  components:  execution  log,  symbolic  states,  and  path  constraints.  The 
execution  log  is  the  instruction  sequence  that  were  executed  from  the  starting  of  an  event  handler 
to  the  returning  of  an  event  handler.  The  execution  log  includes  not  only  the  instructions  in  the 
event  handler  method,  but  also  the  instructions  that  were  executed  in  nested  method  invocations. 
The  symbolic  states  are  the  states  of  all  the  global  variables  at  the  end  of  the  execution  path.  Since 
event  handler  methods  in  Android  generally  have  one  single  parameter  which  is  the  View  object 
correlating  to  the  event,  global  variables  (usually  field  members  of  global  objects)  are  considered 
as  symbols.  The  path  constraints  are  a  set  of  constraints  that  must  be  all  satisfied  in  order  to  re¬ 
visit  a  specific  execution  log.  At  the  beginning  of  execution,  the  path  constraints  contain  a  single 
constraint  true.  When  an  if  statement  is  executed  during  the  execution,  there  are  two  different 
outcomes  depending  on  whether  the  condition  in  the  if  statement  is  satisfied  or  not.  When  the 
direction  of  the  if  statement  is  decided,  the  corresponding  constraint  is  added  into  the  path  con¬ 
straints. 

In  our  implementation,  the  symbolic  states  and  path  constraints  are  represented  in  the  form  of 
Abstract  Syntax  Trees  (AST),  using  keywords  to  indicate  symbols.  The  Dalvik  bytecode  instruc¬ 
tion  set  contains  219  different  instructions.  Among  them  are  many  instructions  that  perform  the 
same  function  but  reflects  different  operand  sizes  or  data  types.  For  example,  there  are  7  instruc- 
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//Java  source  code 
ClassA.Fieldl  =  "sample  string"; 


//Dalvik  bytecode 
const-string  vO,  "sample  string" 

sput  vO,  Lcom/example/ClassA;->Field1  :Ljava/lang/String; 


Figure  5:  An  Example  of  Symbolic  State  Expression  Format 

tions  for  loading  an  element  from  an  array:  aget,  aget-wide,  aget-object,  aget-boolean,  aget-byte, 
aget-char,  aget-short.  Our  symbolie  exeeution  will  parse  these  instruetions  using  the  same  keyword 
$aget.  Overall,  we  have  ereated  17  different  keywords  for  the  whole  Dalvik  byteeode  instruetion 
set.  Figure  shows  an  example  of  the  symbolie  state  expression  format  of  byteeode  instruetion 
sput,  whieh  writes  the  value  of  a  statie  field.  In  this  example,  the  root  node  of  the  AST  has  the  name 
“=”,  indieating  this  expression  is  a  symbolie  state.  The  left  ehild  of  the  root  node  has  a  keyword 
$static-field,  representing  a  symbolie  value  with  a  unique  signature  com.example.ClassA.fieldl . 

The  symbolie  exeeution  proeedure  takes  a  method  signature  as  input,  and  outputs  a  list  of  path 
summaries  for  all  the  different  exeeution  paths.  We  have  implemented  the  symbolie  exeeution  in 
a  VM  like  strueture.  The  symbolie  VM  eontains  heap  and  method  staek.  In  the  method  staek, 
eaeh  method  will  be  assigned  a  group  of  registers  that  stores  loeal  variables.  The  value  stored  in 
a  register  ean  be  either  a  literal  value  or  a  reference  value.  Referenee  values  usually  represent  the 
address  of  an  objeet  from  the  heap.  We  implemented  the  heap  as  a  list  of  objeets  with  a  symbolie 
value  and  field  members.  At  the  end  of  eaeh  symbolie  exeeution,  the  state  of  global  variables  are 
eolleeted  from  the  heap.  Figure shows  an  example  of  the  symbolie  VM  state  during  runtime.  In 
this  example,  methodl  is  being  exeeuted  and  sits  on  top  of  the  method  staek.  The  instruetion  in  line 
0  writes  literal  value  0x5  into  register  vO.  The  instruetion  in  line  1  ereates  an  objeet  in  the  heap, 
and  puts  the  objeet’s  address  objl  into  register  vl.  The  instruetions  in  line  2  and  3  then  eopies  the 
value  of  register  vl  and  write  to  two  instanee  fields:  fieldl  md  field!  of  the  this  instanee  of  elass 
MainActivity. 

Unlike  eoneolie  exeeution  in  single  entry  point  programs,  eoneolio  exeeution  in  Android  pro¬ 
grams  is  unable  to  “flip”  a  path  eonstraint  at  the  end  of  one  exeeution  and  direetly  provide  a  eon- 
erete  input  for  the  next  exeeution.  The  first  reason  is  that  Android  programs  have  multiple  entry 
points.  The  seeond  reason  is  that  the  entry  method  parameters  are  not  the  only  symbolic  values 
that  deeide  the  path  eonstraints.  In  order  to  find  the  symbolie  values  that  ean  satisfy  the  “flipped” 
eonstraint,  we  often  need  to  look  into  the  path  summaries  of  other  event  handlers  and  find  the  ones 
whose  symbolie  states  at  the  end  of  exeeution  ean  satisfy  that  eonstraint. 

3.2.3  Event  Sequence  Generation 

The  event  sequenee  generation  takes  speeifie  eode  targets  as  input,  and  outputs  a  list  of  event  se- 
quenees  that  ean  potentially  reaeh  the  targets.  First,  a  eode  target  is  matehed  with  the  exeeution  logs 
of  all  the  path  summaries.  If  an  exeeution  log  is  found  to  eontain  the  target,  then  the  eorresponding 
path  summary  ean  trigger  this  speeifie  target.  If  the  path  summary  is  generated  eoneretely,  then 
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//  Dalvik  bytecode 

0  const  vO,  0x5 

1  new-instance  vl,  File 

2  [put  vl,  pO,  <fieldl> 

3  [put  vl,  pO;  <field2> 


Figure  6:  An  Example  of  the  APEx  Symbolic  Runtime  VM  State 

the  event  sequenee  of  this  path  summary  is  the  input  to  exeeute  the  refleetion  eall.  But  if  the  path 
summary  is  generated  symbolieally,  a  eonstraint  solving  process  is  required. 

Constraint  Solving  starts  from  a  desired  path  summary  by  adding  preceding  path  summaries  to 
the  sequence,  then  keeps  solving  constraints  of  newly  added  path  summaries  until  the  main  entry 
of  the  application  is  reached.  Although  a  symbolically  generated  path  summary  does  not  have  a 
concrete  event  sequence,  the  “final”  event  in  its  event  sequence  is  already  known.  In  order  to  find 
the  preceding  events,  the  path  constraints  of  this  path  summary  must  all  be  satisfied. 

Generally,  the  constraint  solving  process  first  searches  for  relevant  symbolic  states  from  other 
path  summaries  that  can  potentially  satisfy  each  constraints,  then  the  CVC4  SMT  solver  is  used  to 
determine  whether  the  relevant  symbolic  states  satisfy  the  path  constraints.  When  path  summaries 
whose  symbolic  states  satisfy  all  the  path  constraints  are  found,  the  event  sequences  of  these  path 
summaries  is  inserted  in  the  front  of  the  existing  event  sequence.  These  newly  found  path  sum¬ 
maries  then  become  the  new  subject  of  constraint  solving.  This  process  repeats  until  there  are  no 
new  path  constraints  to  solve.  As  a  result,  a  list  of  event  sequences  are  generated. 

In  practice,  there  are  many  types  of  path  constraints  that  SMT  solvers  cannot  directly  solve. 
For  instance,  when  system  API  GregorianCalendar.get( Calendar.HoursOJDay)  is  used  in  a  branch 
statement,  there  are  no  GUI  events  that  can  satisfy  this  constraint.  More  generally,  the  two  most 
common  type  of  Android  system  APIs  that  we  would  encounter  in  path  constraints  are:  (1)  APIs 
for  accessing  OS  settings  and  environment  variables,  (2)  APIs  for  accessing  GUI  widget  properties. 
For  these  types  of  APIs,  we  have  developed  a  signature  based  solver  that  can  generate  a  system 
event  for  recognizable  API  signatures  in  the  constraint.  This  approach  is  implemented  by  manually 
building  the  signature  pool  and  corresponding  system  events  case  by  case. 

Target  Prioritization  is  used  to  deal  with  path  explosion,  a  major  challenge  of  symbolic  execution. 
The  number  of  execution  paths  in  a  method  grows  exponentially  by  the  number  of  branching 
statements.  In  order  to  avoid  path  explosion,  we  implemented  a  target  prioritization  mechanism  in 
the  sequence  generation  process. 

Each  symbolic  path  summary  is  assigned  with  a  priority  factor.  A  path  summary  has  higher 
priority  if  it  meets  any  of  the  below  conditions: 

•  its  execution  log  contains  the  target 

•  its  execution  log  contains  GUI  transition  statements 
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•  its  symbolic  states  has  more  symbol  variables 

We  implemented  the  priority  with  a  linear  ealeulation  with  a  weight  ratio  of  1.  This  feature 
guarantees  that  more  relevant  path  summaries  are  always  solved  first. 

Sequence  Validation  is  used  to  eull  out  infeasible  paths.  It  is  needed  beeause  during  the  sequenee 
generation  proeess,  there  is  no  validation  on  the  event  sequenees.  Some  of  the  generated  event 
sequenees  might  be  infeasible  or  ineorreet,  for  reasons  sueh  as:  errors  in  GUI  model,  unsolved 
eonstraint  involving  system  APIs,  ete.  Therefore  it  is  neeessary  to  have  a  validation  proeess. 

Every  generated  event  sequenee  is  applied  to  the  app  while  running  on  an  Android  deviee.  We 
determine  the  eorreetness  of  a  event  sequenee  by  eomparing  the  concrete  exeeution  log  to  the  eor- 
responding  symbolic  path  summary’s  execution  log.  This  is  implemented  with  instrumenting  the 
beginning  and  ending  of  eaeh  basic  block  of  every  method.  We  ean  retrieve  the  eonerete  execu¬ 
tion  log  via  logcat  during  validation.  If  the  eonerete  exeeution  logs  matehes  the  eorresponding 
symbolie  path  summary’s  execution  log,  then  the  event  sequenee  is  a  eorreet  sequenee. 

4.0  RESULTS  AND  DISCUSSIONS 

In  this  seetion,  we  report  the  performanee  evaluation  of  JiTANA  and  APEX. 

4.1  Performance  Evaluation  of  JiTANA 

To  evaluate  the  performanee  of  Jitana  we  applied  it  to  five  real-world  apps.  Table  lists  these 
apps,  together  with  data  on  their  size  and  eomplexity. 

On  eaeh  of  these  apps,  we  measured  the  exeeution  times  required  to  perform  (i)  APK  loading, 
(ii)  eall-graph  generation,  and  (iii)  reaching  definitions  analysis  and  addition  of  def-use  edges  to 
instruetion  graphs.  We  eolleeted  measurements  for  these  tasks  five  times  for  eaeh  app;  our  results 
present  averages  aeross  these  five.  We  reeorded  exeeution  times  in  seeonds,  and  these  ineluded  the 
times  needed  to  proeess  all  system  elasses  neeessary  to  load  the  app  elasses. 

Note  that  Eaeebook,  unlike  the  other  four  apps,  dynamioally  generates  and  loads  seeondary  DEX 
files.  Jitana  is  able  to  eapture  and  analyze  these  files  automatieally  using  information  from  the 
virtual  maehine.  Thus,  the  times  reported  for  Eaeebook  include  the  times  required  to  analyze  these 
seeondary  files. 

Table  1^ reports  the  three  classes  of  exeeution  times  in  the  columns  headed  “Loading” ,  “Call 
Graph”,  and  “R.  Defs”,  respeetively,  along  with  the  total  time  taken  to  perform  all  three  {Total). 
These  numbers  were  all  gathered  on  a  workstation  with  3.2  GHz  Core  i5  and  32  GB  DDRS  RAM 
running  Apple  OS  X  10.11.3  (El  Capitan). 

As  Table  shows,  JiTANA  generated  the  basie  analysis  building  bloeks  for  the  five  apps  in 
overall  times  ranging  from  0.33  seeonds  for  SuperDepth  to  1 1.4  seeonds  for  Facebook. 

We  also  attempted  to  perform  the  same  tasks  using  SoOT,  but  found  that  developing  a  methodol¬ 
ogy  to  perform  a  direet  eomparison  was  ehallenging.  As  sueh,  we  do  not  report  numbers  for  SoOT 
in  Table  instead,  we  report  our  observations  and  highlight  the  key  differenees  that  make  a  direet 
eomparison  diffieult.  Eirst,  JiTANA  was  able  to  analyze  more  elasses  in  four  out  of  five  apps.  Dex- 
PLER  translates  only  elasses  that  are  part  of  the  APK,  and  does  not  eonsider  any  system  elasses 
needed  to  initialize  the  apps.  As  sueh,  we  found  that  Jitana  analyzed  4,942  elasses  fox  Instagram 
but  Dexpler  passed  only  4,641  elasses  on  to  SoOT  to  analyze.  There  were  also  differenees  in  the 
numbers  of  analyzed  elasses  in  SuperDepth,  Google  Earth,  and  Twitter.  These  differenees  render 
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fair  comparisons  difficult.  Second,  Dexpler  does  not  eonsider  dynamieally  generated  elasses, 
and  Facebook  generates  three  additional  large  DEX  files  as  it  initializes.  In  this  ease  Dexpler 
eonsiders  only  the  main  DEX  file  whieh  has  6,350  elasses,  whereas  JiTANA  ineluded  all  four  DEX 
files,  analyzing  23,621  elasses.  Again,  this  renders  fair  eomparisons  diffieult. 

Despite  these  diffieulties,  we  do  note  the  following.  SoOT  requires  a  translation  proeess  from 
DEX  to  Jimple,  and  this  proeess  alone  requires  more  time  than  the  entire  analysis  time  required  by 
JiTANA.  Eor  example,  it  took  Dexpler  23  seeonds  just  to  translate  the  Facebook  app  and  Soot 
21  more  seeonds  to  analyze  the  app  with  three  missing  dynamieally  generated  DEX  files.  On  the 
other  hand,  JiTANA  needed  only  11.4  seeonds  to  analyze  all  four  DEX  files  in  the  app  (and  with 
almost  four  times  as  many  elasses  to  analyze). 

Table  3.  Analysis  Times  Measured  When  Applying  Jitana  to  Five  Real-World  Apps 


Name 

Classes 

#  of 

Methods  Fields 

FoDC 

Foading 

Time  (seconds) 

Call  Graph  R.  Defs 

Total 

SuperDepth 

68 

2,035 

1,355 

50,779 

0.16 

0.05 

0.12 

0.33 

Google  Earth 

1,213 

10,698 

4,137 

136,679 

0.58 

0.15 

0.28 

1.01 

Twitter 

3,675 

34,390 

15,715 

442,243 

2.03 

0.49 

0.88 

3.40 

Instagram 

4,942 

37,747 

16,514 

477,700 

2.11 

0.53 

0.94 

3.58 

Facebook  (katana) 

23,621 

130,428 

76,443 

1,548,801 

6.81 

1.75 

2.84 

11.40 

In  summary,  our  investigation  reveals  differenees  between  the  two  frameworks  that  ean  be  sum¬ 
marized  as  follows.  Eirst,  translation  overhead  ean  be  high  when  SoOT  is  used  to  analyze  Android 
apps.  Seeond,  the  hybrid  design  of  JiTANA  allows  it  to  analyze  more  elasses  that  inelude  elasses 
in  the  APK,  system  elasses,  and  dynamieally  generated  elasses. 

4.2  Performance  Evaluation  of  APEX 

We  tested  APEx  on  a  total  of  14  apps  to  examine  the  performanee  of  APEX.  The  test  apps 
inelude  12  apps  from  APAC  engagements  and  2  benehmark  apps:  TippyTipper  and  Dragon.  These 
apps  are  mostly  malware  samples  that  utilize  a  variety  of  anti-analysis  teehniques  to  evade  seeurity 
analysis.  We  evaluated  the  effeetiveness  of  APEx  in  terms  of  eode  eoverage  and  target  eoverage 
using  these  subjeets. 

4.2.1  Code  Coverage 

Our  eode  eoverage  is  based  on  byteeode  statement  eoverage.  Comparing  to  method  eoverage,  this 
fine  grained  metries  ean  better  represent  the  pereentage  of  different  program  paths  being  explored 
by  our  ooneolie  exeeution  engine.  The  total  number  of  byteeode  statements  is  measured  statieally 
using  apktool.  We  instrumented  these  11  apps  and  monitored  logcat  output  during  runtime  to 
measure  the  number  of  eovered  byteeode  instruetions.  Sinee  most  of  the  test  apps  eontain  third 
party  libraries  in  their  binaries,  we  have  manually  identified  and  exeluded  these  library  eode  from 
the  results. 

Table  1^  shows  APEx’s  eode  eoverage  results  on  the  seleeted  apps.  The  eolumns  showed  the 
total  number  of  byteeode  lines  and  the  eovered  byteeode  lines  during  the  input  generation.  The  last 
eolumn  showed  the  number  of  restarts  during  the  GUI  exploration  stage.  After  manual  examination 
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on  the  low  coverage  apps,  the  main  reason  for  the  low  coverage  is  due  to  our  constraint  solver  being 
unable  to  solve  certain  constraints  involved  with  system  APIs,  for  example,  Math.floor( double)  is 
not  supported  by  the  SMT  solver.  Effectively  dealing  with  the  system  APIs  and  libraries  is  still  a 
great  challenge  and  continues  to  be  the  focus  of  APEx  development. 

Table  4.  Code  Coverage  of  APEx  On  11  Apps 


App  Name 

EoDC 

Eine  Coverage 

Restart  Times 

Dragon 

335 

307  (92%) 

28 

MunchEife 

631 

396  (53%) 

8 

TippyTipper 

4520 

2640  (58%) 

20 

Calc  A 

789 

611  (77%) 

2 

CalcC 

796 

602  (76%) 

2 

CalcE 

1210 

663  (55%) 

2 

EullControl 

3044 

1229  (40%) 

2 

KitteyKittey 

723 

397  (55%) 

3 

PasswordSaver 

842 

478  (61%) 

2 

SMS  Backup 

778 

478  (61%) 

2 

SourceViewer 

382 

245  (64%) 

2 

Generating  Input  for  Specific  Targets.  To  test  APEx’s  effectiveness  in  generating  input  for 
specific  targets,  we  picked  8  test  apps  and  specified  various  code  call  sites  within  those  apps  as 
targets  for  APEx.  The  targets  are  determined  using  JITANA’s  coverage  report  from  random 
testing  and  unit  testing  on  the  test  apps.  Eirst  we  identified  basic  blocks  that  have  not  been  executed; 
these  methods  represent  hard  to  reach  targets.  We  then  selected  the  first  instruction  from  each  of 
those  blocks  and  use  those  bytecode  instructions  as  targets  for  APEx  to  generate  input  sequences. 


Table  5.  Target  Coverage  of  APEx  on  8  Apps 


App  Name 

Targets  Reached 

Max  Sequence  Eength 

Dragon 

5/5  (100%) 

6 

Munchlife 

20/29  (69%) 

8 

TippyTipper 

16/57  (28%) 

5 

Battles  tat 

10/88(11%) 

7 

rEurker 

12/141  (9%) 

5 

AudioSidekick 

12/79  (15%) 

4 

AWeather 

4/170  (2%) 

3 

Engologist 

6/129  (4%) 

3 

If  APEx  is  able  to  generate  input  sequences  for  a  target  and  at  lease  one  of  the  sequences  are 
validated,  then  this  target  is  considered  to  be  reached.  The  results  are  shown  in  Table  The 
columns  first  showed  the  percentage  of  reached  targets,  then  showed  the  number  of  events  in  the 
longest  event  sequences  generated  for  each  app. 

The  reason  for  the  poor  performance  of  APEx  on  some  of  the  test  apps  is  still  mainly  the 
unsolved  API  constraints.  With  our  current  signature  based  API  constraint  solver,  we  can  only 
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deal  with  a  limited  set  of  APIs.  Therefore,  we  eould  not  eomplete  the  sequenee  generation  process 
for  the  path  summaries  that  contain  APIs  that  we  do  not  recognize. 

Table  6.  Comparison  in  Target  Coverage  Between  Collider  and  APEx. 


App  Name 

Target  coverage 
by  Collider 

Target  coverage 
by  APEx 

Tippytipper 

Munchlife 

7/16  (44%) 
6/10  (60%) 

16/57  (28%) 
20/29  (69%) 

Next  we  compare  the  target  sequence  generation  result  of  APEX  with  that  of  Collider,  a  state-of- 
the-art  concolic  execution  engine  p^.  In  the  evaluation  of  Collider,  the  target  lines  were  selected 
from  unreached  bytecode  lines  after  running  both  Monkey  and  crawler.  The  targets  used  by  Col¬ 
lider  were  not  exactly  the  same  as  those  used  by  APEx;  however  all  targets  were  deemed  hard  to 
reach.  In  TippyTipper,  Collider  was  able  to  reach  7  targets  out  of  16,  while  APEx  has  reached  16 
out  of  57.  In  Munchlife,  Collider  was  able  to  reach  6  out  of  10  targets,  while  APEx  reached  20  out 
of  29.  The  comparison  in  target  coverage  is  shown  in  Table  We  can  see  that  APEx  has  overall 
lower  coverage  rate  while  reaching  more  targets  than  Collider.  Without  a  thorough  comparisons, 
it  is  difficult  to  determine  which  tool  performs  better.  However,  CollideCs  sequence  generation 
requires  a  manually  built  GUI  model,  while  APEx  does  not  make  any  assumptions  nor  require 
manual  effort  with  building  the  GUI  model.  Overall,  despite  the  problems  and  limitations,  APEx 
is  easier  to  deploy  than  a  state  of  the  art  concolic  execution  engine  Collider,  while  able  to  reach 
more  targets  in  the  same  apps. 

5.0  APPLICATIONS  OF  PROPOSED  FRAMEWORKS 

In  this  section,  we  report  the  results  of  applying  the  proposed  framework  to  address  emerging 
security  challenges.  We  investigated  five  scenarios  in  this  study.  In  the  first  scenario,  we  explore 
an  approach  that  uses  runtime  information  to  guide  input  generation  through  concolic  execution. 
In  the  second  scenario,  we  compare  the  performance  of  JiTANA  on  an  analysis  task  to  that  of  a 
benchmark  approach.  We  then  present  an  example  in  which  JiTANA  supports  the  creation  of  a 
real-time  visualization  engine  to  provide  real-time  feedback  about  the  results  of  an  analysis.  In  the 
fourth  scenario,  we  investigate  the  scalability  of  JiTANA  and  its  potential  applicability  to  bring- 
your-own-device  (BYOD)  environments.  Einally,  we  illustrate  how  JiTANA  can  perform  analysis 
of  dynamically  loaded  code. 

5.1  Runtime  Guided  Input  Generation  through  Concolic  Execution 

One  of  the  main  purposes  of  developing  input  generation  techniques  is  to  help  dynamic  malware 
analysis  by  generating  input  sequences  to  expose  suspicious  activities.  In  this  application,  we 
first  try  to  validate  a  hypothesis  that  malicious  code  exists  in  rarely  executed  paths.  We  used 
JITANA  to  collect  VM  internal  runtime  information  and  identify  rarely  executed  paths,  then  use 
these  rarely  executed  paths  in  applications  that  APEx  can  support  as  the  targets  for  the  input 
generation  process. 

In  order  to  identify  rarely  executed  paths,  we  ran  test  apps  on  a  Nexus  7  device  with  a  modified 
Dalvik  VM  that  are  connected  to  workstations  running  JITANA.  We  used  both  monkey  and  unit 
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Table  7.  Executed  Malware  Locations  Across  Engagements 


Engagement 

#  of  apps 

#  of  known  malware 

#  of  executed  malware 

% 

lA-  1C 

18 

16 

9 

56.0 

3A-3B 

3 

15 

6 

40.0 

4A-6A 

3 

6 

3 

50.0 

Total 

24 

37 

18 

49.0 

test  cases  to  drive  the  execution  of  test  apps.  We  configured  the  monkey  to  generate  20,000  random 
events  from  the  main  activity  of  each  test  app  we  then  supplemented  these  random  test  cases  with 
unit  test  cases  to  reach  code  coverage  of  60%  or  more. 

During  the  test,  JITANA  continuously  receives  the  VM  internal  runtime  information  from  the 
modified  Dalvik  VM,  including  the  bytecode  execution  log  and  the  bytecode  traffic  into  the  JIT 
compiler.  JITANA  overlays  those  dynamic  information  on  top  of  static  information,  and  generated 
a  bytecode  coverage  report  for  the  test  app  after  each  run.  The  coverage  report  is  consisted  of 
bytecode  traces.  Each  trace  contains  a  bytecode  instruction’s  location  (class  signature,  method 
signature,  bytecode  index)  and  an  execution  counter  showing  how  many  times  this  instruction  has 
been  executed. 

With  the  JITANA  coverage  report,  we  first  extract  the  bytecode  traces  with  execution  counter 
being  5  or  fewer.  These  bytecodes  are  considered  to  belong  in  rarely  executed  paths.  Furthermore, 
with  the  help  from  manual  analysis  and  red  reports  of  engagement  apps,  we  highlight  the  known 
malicious  code  locations  from  the  rarely  executed  bytecodes. 

We  first  performed  analysis  across  engagements.  We  report  the  result  in  Table  |7]  Even  with 
our  best  efforts  to  provide  good  code  coverage  through  random  and  unit  testing,  we  are  only  able 
to  executed  49%  of  malware  locations  in  total.  We  also  find  that  it  is  easier  to  generate  input  to 
get  good  code  coverage  for  apps  in  Engagements  lA  -  1C.  The  average  code  coverage  for  the  18 
apps  from  the  first  engagement  is  70%.  On  the  other  hand,  it  is  much  more  challenging  to  generate 
inputs  that  can  yield  high  code  coverage  for  the  remaining  apps.  In  this  case,  in  spite  of  our  best 
efforts,  we  can  only  reach  an  average  of  43%.  This  may  indicate  that  these  later  apps  are  more 
complex. 

Next,  we  selected  applications  that  APEx  can  run  without  encountering  any  runtime  errors. 
These  apps  are  listed  in  and  are  the  same  apps  used  to  evaluate  the  effectiveness  of  APEx  in 
Section  4.2  We  report  the  number  of  malicious  locations  we  retrieved  from  the  red  reports,  then 
the  total  number  of  the  malicious  bytecodes  that  were  executed.  We  can  see  that  the  malicious  call 
sites  in  FullControl,  MorseCode,  and  smsBackup  are  executed  many  times  during  both  test  runs. 
In  these  cases,  monkey  has  executed  the  malicious  call  sites  much  more  than  unit  test,  due  to  the 
fact  that  monkey  applied  a  much  greater  amount  of  events.  On  the  other  hand,  unit  test  was  able 
to  trigger  malicious  activities  while  monkey  could  not  for  CalcC,  KitteyKittey,  PasswordSaver,  and 
SourceViewer.  The  hardest  to  reach  malicious  call  sites  belong  in  CalcA  and  CalcF  where  neither 
monkey  and  unit  test  can  trigger  the  malicious  activities. 

The  rarely  executed  malicious  call  sites  are  then  used  as  input  for  APEx  to  use  concolic  ex¬ 
ecution  to  generate  input  sequences.  As  explained  in  Section  3.2[  during  the  input  generation 
process,  symbolic  path  summaries  whose  execution  logs  contain  those  bytecode  instructions  are 
high  priority  in  the  sequence  generation  process. 
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Table  8.  Execution  Counter  of  Malicious  Call  Sites 


App  name 

#  of  malieious 
eall  sites 

#  of  malieious  eall 
site  exeeution 
(monkey) 

#  of  malieious  eall 
site  exeeution 
(unit  test) 

CaleA 

1 

0 

0 

CaleC 

1 

0 

15 

CaleE 

1 

0 

0 

EullControl 

1 

45 

5 

KitteyKittey 

1 

0 

5 

MorseCode 

1 

18 

1 

Passwords  aver 

1 

0 

2 

smsBaekup 

1 

168 

2 

Souree  Viewer 

1 

0 

4 

We  show  the  results  of  three  apps:  CalcA,  CalcF,  SourceViewer  from  the  set  in  Table  as  test 
apps.  The  malieious  byteeode  loeation  are  feed  into  APEx  to  see  whether  APEx  ean  generate 
eorreet  event  sequenees.  The  results  are  shown  in  Table  As  we  ean  see,  APEx  was  able  to 
reaehed  the  malieious  eall  site  in  with  an  event  sequenee  of  length  6.  Unfortunately,  APEx  did  not 
reaeh  the  targets  in  the  other  two  apps.  The  failure  was  due  to  the  faet  that  In  CalcA  and  CalcF,  the 
malieious  aetivities  both  eontain  API  GeorgianCalendar.get(hourOJDay)  whieh  prevented  APEx 
from  generating  eorreet  event  sequenees.  APEx  was  not  able  to  generate  sequenees  for  targets  in 
other  apps  as  eonstraints  to  generate  sueh  sequenees  eannot  be  resolved. 

Table  9.  APEx  Result  in  Malware  Target  Prioritized  Input  Generation 


App  name 

Target  reaehed 
by  APEX 

Max  sequenee 
length 

CaleA 

No 

N/A 

CaleE 

No 

N/A 

Souree  Viewer 

Yes 

6 

5.2  Inter- App  Communication  Analysis 

Reusable  eomponents  are  an  integral  part  of  Android  app  development.  There  are  four  types  of 
eomponents  in  Android.  (1)  Aetivities  are  the  user  interfaee  eomponent  of  an  app.  (2)  Serviees  are 
used  to  run  tasks  that  do  not  require  any  UI  or  tasks  that  are  too  long  to  run  on  the  UI  thread  in  the 
baekground.  (3)  Broadeast  reeeivers  are  the  eomponents  that  ean  reeeive  a  message  from  any  app. 
(4)  Content  providers  work  like  databases  and  are  used  for  sharing  data  between  apps. 

All  but  eontent  provider  eomponents  use  intents  to  aehieve  inter-eomponent  eommunieation. 
There  are  two  types  of  intents:  explieit  and  implieit.  Explieit  intents  are  designed  speeifieally 
to  eause  a  partieular  eomponent  to  begin  exeeuting  using  its  fully-qualified  elass  name.  Implieit 
intents  speeify  aetions  but  do  not  provide  information  on  whieh  eomponent  needs  to  run.  An  An- 
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droidManifest.xml  file  defines  intent- filters  that  help  eonneet  aetions  with  eomponentsjii]  Through 
these  two  types  of  intents,  Inter-Component  Communication  (ICC)  and  Inter- App  Communication 
(lAC)  ean  occur.  With  lAC,  data  can  flow  from  one  app  to  another  app.  Being  able  to  identify  these 
communication  channels  can  help  engineers  and  analysts  identify  API  incompatibilities  after  sys¬ 
tem  or  software  updates,  and  vulnerable  communication  channels  that  cyber-criminals  can  exploit 
to  compromise  systems  (T7). 


To  detect  information  flow  through  lAC  and  ICC  channels,  Li  et  al.  [17  19 1  introduce  ICCTA, 
a  Soot  [[T5|  based  framework  used  to  perform  cross-app  and  cross-component  taint  analysis.  Fig¬ 
ure  [7]  highlights  the  workflow  ICCTA  uses  to  perform  lAC  analysis,  a  workflow  that  includes  sev¬ 
eral  tools:  Epicc  [|24),  ApkCombiner  [[T8|,  Dexpler  0  and  FlowDroid  0.  First,  Epicc 
stores  information  on  edges  that  can  potentially  represent  lAC  connections  in  a  database.  Second, 
ApkCombiner  combines  multiple  Android  Packages  (APKs)  into  a  single  DEX  file.  Next,  to 
facilitate  analysis  by  SoOT,  Dexpeer  converts  DEX  instructions  to  dimple  instructions.  The  com¬ 
bined  apps  are  now  analyzed  by  Soot  to  extract  lAC  edges  and  other  data.  Einally,  ElowDroid 
builds  complete  control-flow  graphs  for  the  combined  components  using  results  from  SoOT  and 
information  stored  in  the  DBMS  by  Epicc.  It  then  performs  taint  analysis  [|T^. 

The  foregoing  process  requires  five  tools  to  perform  a  workflow  involving  six  steps  (combin¬ 
ing  components,  storing  information,  conversion,  analysis,  building  control-flow  graphs,  and  taint 
analysis),  that  require  temporary  data  to  be  created  by  each  step. 

Next,  we  compared  the  performance  of  ICCTA  with  that  of  JiTANA  for  lAC  analysis.  Currently, 
however,  Jitana  does  not  yet  support  taint  analysis  itself  so  at  best  we  can  compare  only  the 
steps  of  the  overall  analysis  that  support  the  taint  analysis  step.  However,  because  the  processes  of 
identifying  edges  and  performing  taint  analysis  are  both  done,  in  ICCTA,  using  EeowDroid,  we 
cannot  remove  just  the  taint  analysis  step.  Therefore,  we  chose  to  remove  the  overhead  of  Elow¬ 
Droid  altogether.  Since  Jitana  does  perform  control-flow  analysis  to  detect  lAC  connections, 
our  comparison  errs  in  favor  of  ICCTA. 

Both  Jitana  and  ElowDroid  detect  lACs  due  to  implicit  and  explicit  intents  so  we  study  apps 


''  http :  / / developer .  android .  com/ guide/ components/ intents- 
f liters . html 
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Table  10.  Comparing  Analysis  Times  and  the  Number  of  Discovered  lAC  Connections  Be¬ 
tween  IccTA  and  Jitana  (Note  That  (-)  Indicates  That  the  Analysis  Process  Failed  to  Com¬ 
plete) 


Applications 

Source 

Time 

(seconds) 

IccTA 

Implicit 

lACs 

Explicit 

lACs 

Time 

(seconds) 

IITANA 

Implicit 

lACs 

Explicit 

lACs 

Echoer.apk 

StartActivityForResultl  .apk 

SendSMS.apk 

(Total  size  =  760  KB) 

DroidBench 

88.2 

2 

0 

8.6 

2 

0 

Dragon. apk 

Morsecode.apk 
(Total  size  =  444  KB) 

JitanaBench 

53.0 

0 

1 

7.6 

0 

1 

App  1  .Source  .apk 
App2_Sink.apk 
(Total  size  =  524  KB) 

JitanaBench 

(-) 

(-) 

(-) 

9.0 

1 

0 

com.facebook.katana-2.apk 
com.facebook.orca- 1  .apk 
com.spotify.music.apk 
(Total  size  =  80  MB) 

Play  Store 

(-) 

(-) 

(-) 

35.4 

8 

0 

6  soc.  network  apps 
(Total  size  =  150  MB) 

Play  Store 

(-) 

(-) 

(-) 

82.4 

39 

0 

13  games 

(Total  size  =  585  MB) 

Play  Store 

(-) 

(-) 

(-) 

191.4 

0 

0 

7  random  apps 
(Total  size  =  127  MB) 

Play  Store 

(-) 

(-) 

(-) 

120.2 

4 

0 

Combine  all  26  apps 
(Total  size  =  860  MB) 

Play  Store 

(-) 

(-) 

(-) 

(-) 

(-) 

(-) 

that  produce  both  types  of  lAC  eonnections.  We  also  added  26  apps  randomly  seleeted  from  the 
list  of  Google  Play  store’s  top- 100  apps.  These  additional  apps  inelude  soeial-networking,  game, 
and  other  apps.  Table  [T0|  provides  details  on  the  apps,  organizing  them  into  eight  groups  (each 
represented  by  a  row  in  the  table).  We  apply  eaeh  of  the  approaches  to  eaeh  group  of  apps  to 
eolleet  performanee  data  when  all  apps  within  a  group  are  analyzed  simultaneously.  This  means 
that  for  IccTA  we  attempted  to  use  ApkCombiner  to  eombine  all  apps  within  a  group,  and  for 
Jitana  we  attempted  to  use  the  CLVM  to  load  the  apps  within  a  group. 

Columns  3-5  of  Table [T0| show  the  results  obtained  using  IccTA,  and  Columns  6-8  show  results 
obtained  using  JiTANA.  We  present  results  using  three  metries:  the  time  required  in  seeonds  to  per¬ 
form  the  analysis  and  the  numbers  of  implieit  and  explieit  lAC  conneetions  detected.  Entries  of  the 
form  “(-)”  indieate  oases  in  whioh  the  approach  was  unable  to  perform  the  given  analysis.  As  the 
table  shows,  IccTA  was  able  to  analyze  only  the  first  two  groups  consisting  of  microbenchmarks. 
It  also  failed  to  detect  lAC  connections  in  Group  3  (Row  3),  which  consists  of  Jitana  microbench¬ 
marks.  On  the  microbenehmark  apps  on  whioh  it  funotioned  fully,  it  took  88.2  seeonds  and  53.0 
seeonds  to  analyze  the  programs.  We  also  find  that  existing  problems  in  ICCTA’s  oomponents  be- 
oome  limitations.  For  example,  ApkCombiner  has  been  evaluated  only  on  oomponents  smaller 
than  1.4MB  in  size  [T^,  and  required  200  to  400  seeonds.  When  applied  to  larger  apps  suoh  as 
Facebook  and  Spotify,  it  failed.  As  also  noted  by  Li  et  al  [  18 1,  APKCOMBINER  does  not  guarantee 
oorreotness  of  the  oombined  file.  This  also  beoomes  a  limitation  for  determining  the  number  of 
oomponents  that  oan  be  oombined  and  analyzed  by  ICCTA.  It  also  failed  to  analyze  groups  that  use 
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real-world  apps  from  Play  store  due  to  their  large  sizes  [  18 1.  For  oases  in  whieh  both  approaehes 
work,  JiTANA  was  between  7  and  10  times  faster  than  ICCTA,  even  though  Jitana  was  oomputing 
oomplete  oontrol-flow  graphs  and  using  them  to  oonstruot  lAC  graphs. 


5.3  In  Situ  Visualization  of  Code  Coverage 


Figure  8:  In-Situ  Visualization  with  TraVis 


Our  third  use  ease  involves  using  JiTANA  to  provide  real-time  feedbaok  of  oode  ooverage  mea¬ 
surement.  Code  ooverage  is  an  important  metrio  for  assessing  the  quality  of  test  suites.  Beoause 
eode  ooverage  is  measured  as  a  program  is  exeroised,  measuring  it  is  a  form  of  dynamio  analysis. 
To  measure  oode  ooverage  in  Android,  EMMAf^  is  still  eommonly  used.  EMMA  was  initially 
ereated  as  a  oode  ooverage  tool  for  Java,  but  it  now  works  for  Android  too.  It  supports  both  tar¬ 
geted  unit  testing  and  random  testing  using  Monkey,  an  Android  UI  exereiser|^  When  used,  it 
adds  between  5%  and  20%  overhead  to  eode  exeeution  time  for  Java  programs.  Eor  event-based 
Android  apps,  our  preliminary  investigation  using  10  apps  reveals  the  overhead  to  be  between  1% 
to  10%  due  to  the  neeessary  delays  that  must  be  added  between  two  events.  With  this  level  of 
overhead,  we  do  not  expeet  Jitana  to  aehieve  signifieant  performanee  benefits  over  EMMA  on 
event-based  apps.  Instead,  the  main  benefit  of  JiTANA  is  in  providing  the  ability  for  engineers  and 
analysts  to  observe  progress  of  the  on-going  analysis  and  monitor  the  intermediate  results  of  oode 
ooverage  measurement. 

With  Jitana,  runtime  information  (basio  blook  ooverage  in  the  ease  eonsidered)  is  generated 
and  proeessed  immediately.  Thus,  there  is  an  opportunity  for  attempting  to  visualize  ooverage 
information  in  situ.  This  seotion  desoribes  the  proeess  we  followed  to  oreate  a  visualization  tool, 

**http :  //emma. sourceforge.net/. 

http  :  // developer.android.com/tools/help/monkey.html. 
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TraV is,  inside  Jitana.  We  then  illustrate  the  eapabilities  of  TraVis  and  show  how  it  ean  proeess 
the  exeeution  information  sent  by  Dalvik  to  provide  eode  eoverage  feedbaek. 

Via  a  JDWP  eonneetion,  TraV IS  periodieally  reeeives  the  dynamie  exeeution  information  nee- 


essary  to  visualize  traees  from  the  Dalvik  VM  whieh  was  modified  as  deseribed  in  Seetion  3.1.5 


As  soon  as  a  DEX  file  on  the  deviee  is  loaded  on  the  virtual  maehine,  TraVis  is  notified  with 
the  file  name.  Upon  notifieation,  TraVis  does  the  following:  (i)  eopy  the  loaded  DEX  file  from 
the  deviee  to  the  workstation  using  an  adb  pull  eommand;  (ii)  ereate  a  buffer  to  store  eounter 
values  for  the  DEX  file;  and  (hi)  let  Jitana  load  the  DEX  file  to  update  the  VM  graphs. 

TraVis  also  polls  for  eounter  values  every  50  milliseconds.  The  values  are  sent  as  an  array  of 
pairs  of  an  instruction  offset  and  the  number  of  times  that  instruction  has  been  executed  since  the 
last  poll.  The  counter  values  are  accumulated  in  the  buffer  created  when  the  DEX  file  is  loaded. 
The  instruction  graphs  are  updated  with  the  new  counter  values  from  this  buffer.  This  data  is 
presented  as  traces  on  a  screen  with  OpenGE  renderings,  and  as  instruction  graphs  rendered  in  a 
Graphviz  viewer. 

Eigure  [^illustrates  how  TraV  is  can  be  used.  A  device  (Nexus  7  in  this  case)  is  first  connected  to 
a  workstation.  Runtime  information  is  sent  from  the  device  to  a  workstation  running  JiTANA.  In  the 
figure,  a  person  is  playing  SuperDepth,  a  classic  video  game  (shown  in  the  lower  right  quadrant). 
While  the  user  plays,  Dalvik  sends  execution  information  on-the-fly  to  JiTANA,  which  processes 
the  information  to  calculate  code  coverage,  which  is  then  fed  to  TraVis.  The  app  requires  no 
instrumentation. 

In  Eigure  [^  the  upper  left  quadrant  displays  the  method  graph  for  SuperDepth  and  the  upper 
right  quadrant  displays  two  instruction  graphs.  Shaded  boxes  indicate  entry  instructions  in  basic 
blocks.  In  each  such  box,  there  is  also  a  counter  to  indicate  the  number  of  times  that  the  basic 
block  has  been  executed.  Eor  example,  the  block  highlighted  by  an  ellipse  has  been  executed  20 
times.  The  block  above  that  corresponds  to  a  conditional  statement  and  so  far,  all  decisions  have 
taken  the  left  branch.  Note  that  these  counters  are  continuously  updated. 

The  bottom  left  quadrant  shows  the  output  of  TraVis.  Each  small  rectangle  on  what  looks  like  a 
“keyboard”  in  the  figure  represents  a  basic  block.  On  a  color  display  (of  this  paper  or  of  the  output 
of  TraVis),  yellow  rectangles  indicate  basic  blocks  that  have  not  yet  been  executed,  blue  rectan¬ 
gles  indicate  “hot”  basic  blocks  (i.e.,  basic  blocks  that  have  been  executed  more  than  five  times), 
magenta  rectangles  indicate  basic  blocks  that  are  currently  being  executed,  and  black  rectangles  in¬ 
dicate  basic  blocks  that  have  been  executed  fewer  than  five  times.  (On  a  black-and-white  printout, 
the  colors  range  from  dark  gray  to  light  gray  with  two  intermediate  shades.)  The  video  clip  that 
the  images  have  been  captured  from  is  available  at  https  :  / /www. youtube. com/watch?v= 
sPdrLdIKDx4, 


5.4  Device  Analysis  in  BYOD  Environments 

To  evaluate  the  scalability  of  JiTANA,  we  also  attempted  to  analyze  all  apps  within  devices 
simultaneously  when  analyzing  apps  for  lAC  connections.  To  do  this,  we  selected  three  devices 
from  our  Android  tablet  stock  pile.  The  first  device  contains  83  apps,  the  second  contains  90  apps, 
and  the  third  contains  106  apps.  We  pulled  all  the  APKs  from  a  given  device  into  Jitana,  using 
CEVM  to  load  all  the  apps  simultaneously  and  construct  the  graphs  needed  to  build  lAC  graphs  to 
detect  connections.  We  anticipate  that  this  particular  task  is  an  example  of  an  analysis  that  could  be 
useful  for  vetting  devices  in  organizations  that  promote  bring-your-own-device  (BYOD)  policies. 
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In  such  situations,  instead  of  considering  an  app  as  the  unit  of  analysis,  we  consider  a  device  as  the 
unit  of  analysis. 


Table  11.  I  AC  Analysis  Results  for  Three  Devices 


Device  ID 

#  of  Apps 

Total  Size 
(MB) 

Edges 

(Implicit) 

Edges 

(Explicit) 

time 

(seconds) 

1 

83 

129 

476 

21 

136 

2 

90 

438 

1216 

50 

601 

3 

106 

1298 

(-) 

(-) 

(-) 

Table  [TT] reports  the  results  of  this  investigation.  As  shown,  JiTANA  was  able  to  analyze  two  of 
the  devices  (devices  with  83  and  90  apps)  successfully.  The  time  required  to  perform  these  two 
analyses  was  136  seconds  and  601  seconds,  respectively.  Note  that  each  reported  time  includes 
the  time  needed  to  load  the  apps  from  the  device  to  the  workstation  running  JiTANA.  Our  system 
ran  out  of  memory  for  the  device  with  106  apps.  Nevertheless,  the  results  on  the  first  two  devices 
suggest  that  JiTANA  can  support  larger-scale  analyses  by  using  larger  computer  clusters.  Further, 
by  using  BGL,  JiTANA  should  be  able  to  perform  analyses  using  any  machines  that  provide  BGL 
support. 


5.5  Analysis  of  Reflection  Usage  in  Android  Apps 


In  Java  programming  language,  which  is  the  main  programming  language  for  Android  applica¬ 
tion  development,  reflection  provides  a  computer  program  with  the  ability  to  load  certain  classes 


during  execution  [27|.  The  main  goal  is  to  allow  a  program  to  examine  and  modify  its  structure 


and  behavior  dynamically.  As  such,  reflection  is  one  important  mechanism  that  developers  use  to 
achieve  backward  compatibility  Q .  In  addition,  reflection  is  also  a  powerful  tool  that  can  help 
with  debugging  as  well  as  developing  pluggable  code. 

However,  the  dynamic  property  of  reflection  has  also  been  exploited  by  malware  authors  to  ob¬ 
scure  intentions  or  hide  malicious  payloads  from  malware  analysis  tools.  Typical  malware  analysis 
tools  that  analyze  source  code  or  bytecode  to  detect  vulnerabilities  tend  to  have  trouble  analyzing 
reflection  target  classes  due  to  multiple  reasons.  First,  while  static  analysis  can  be  commonly  used 
to  identify  where  reflection  calls  are  being  made,  the  dynamic  nature  of  reflection  requires  that 
analysts  have  the  input  sequences  that  can  exercise  these  reflection  call  sites.  However,  creating 
precise  sequences  that  can  reach  specific  targets  in  event-based  and  GUI  rich  applications  is  still 
quite  challenging. 

To  better  understand  how  reflection  is  used  as  part  of  Android  app  development,  we  investigate 
its  usage  in  real-world  Android  applications.  To  do  so,  we  collected  nearly  1800  Android  app 
samples.  We  divide  the  apps  into  3  groups,  as  shown  in  Table [T^ below: 

The  Android  application  samples  consist  of:  1258  malware  samples  from  Android  Malware 


Genome  Project  (AMGP)  [^,  378  newer  (after  2012)  malware  samples  collected  from  various 
sources,  and  126  popular  Android  apps  in  2014  that  have  been  downloaded  from  Google  Play 
Store.  To  determine  reflection  usage,  we  first  implemented  a  static  analysis  in  Soot  based  on  the 
idea  introduced  by  Bodden  et  al.  0.  We  also  utilized  Apktool  p9|,  a  reverse  engineering  tool 
for  Android  apps,  to  help  accomplish  this  task.  A  Reflection  Information  Table  is  generated  by 
reflection  logger  to  record  each  identified  reflection  call  site. 
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Table  12.  Android  Application  Sample  Groups 


app  sample  group 

total # 

time 

Android  Malware  Genome  Project 

1258 

2010-2012 

Newer  Malware 

378 

after  2012 

Google  Play  Store  Top  Chart 

126 

2014 

We  then  elassified  refleetion  into  four  types  as  shown  in  Table  [T3]based  on  different  Java  seman- 
ties.  Typieally,  method  invoeation  via  refleetion  involves  the  following  APIs: 

•  Class. forName, 

•  Class. getDeelaredMethod  (or  Class. getMethod) 

•  Method,  invoke. 

The  name  parameter  used  in  forName()  or  getDeclaredMethod( )  ean  either  be  a  constant  string 
or  a  string  variable.  The  class  object  in  a  Class. forName()  call  is  found  by  either  the  default  class 
loader  or  a  custom  class  loader. 

Table  13.  Reflection  Classification 


Category 

Reflection  Target 

Class  Loader 

1(a) 

Constant  string 

Default 

1(b) 

Constant  string 

Custom 

2(a) 

String  variable 

Default 

2(b) 

String  variable 

Custom 

Each  type  of  reflection  call  indicates  different  techniques  required  to  identify  targets.  An  ex¬ 
ample  of  Type  1(a)  is  shown  in  Figure]^  Method  member  “MyMethod”  of  class  “MyClass”  is 
invoked  via  reflection  APIs.  The  reflection  target,  i.e.,  the  string  names,  are  constant  strings.  The 
default  system  ClassLoader  is  designated  at  runtime  to  look  for  Class  “MyClass” .  Determining 
targets  for  this  type  of  reflection  call  only  requires  simple  static  analysis.  The  string  names  are 
already  known,  and  the  binaries  of  class  “MyClass”  can  only  come  from  the  classes. dex  within 
the  APK  file  or  system  libraries. 

Class  cl  =  Class . forName ( "MyClass ") ; 

Object  obj  =  cl . newinstance { ) ; 

Method  m  =  cl . getDeelaredMethod ( "MyMethod" ) ; 
m. invoke (ob j ) ; 


Figure  9:  Reflection  Type  1(a) 

An  example  of  Type  1(b)  is  shown  in  Figure  In  this  example,  the  reflection  target  is  still  con¬ 
stant  strings,  same  as  Type  1(a).  However,  additional  parameters  are  used  in  the  Class. forName() 
call.  The  third  parameter  loader  is  a  custom  class  loader  object.  A  custom  class  loader  can  specify 
an  arbitrary  path  to  load  classes  at  runtime.  As  such,  it  is  possible  that  the  class  binaries  are  placed 
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DexClassLoader  loader  =  new  DexClassLoader ( 
libpath, dir, null, getloader ( )  )  ; 

Class  cl  =Class . forName { "MyClass" , true, loader)  ; 
Object  obj  =  cl . newinstance { ) ; 

Method  m  =  cl . getDeclaredMethod ( "MyMethod" ) ; 
m. invoke (ob j ) ; 


Figure  10:  Reflection  Type  1(b) 

outside  of  the  system  library  and  classes. dex  from  the  APK  file.  This  type  of  refleetion  eall  would 
require  dynamic  analysis  approaches  to  precisely  determine  targets. 

An  example  of  Type  2(a)  is  shown  in  Figure  The  target  names  in  Class. forName( )  and 
Class. getDeclaredMethodO  are  provided  as  string  variables,  and  default  class  loader  is  used  in 
Class.forName( )  call.  It  is  guaranteed  that  the  class  is  loaded  from  the  system  library  path  by  the 
system  class  loader.  However,  in  order  to  retrieve  the  class  name  or  the  method  name,  dynamic 
analysis  approaches  are  needed  to  precisely  determine  targets. 

Class  cl  =  Class . forName (className) ; 

Object  obj  =  cl . newinstance {) ; 

Method  m  =  cl . getDeclaredMethod (methodName) ; 
m. invoke (ob j ) ; 


Figure  11:  Reflection  Type  2(a) 

An  example  of  Type  2(b)  is  shown  in  Figure  [T^  In  this  example,  the  reflection  target  name  is 
provided  as  a  string  variable.  A  custom  class  loader  is  also  used  in  Class.forName()  to  search  and 
load  the  target  class.  As  such,  static  analysis  alone  is  not  enough  to  find  the  class  name  or  library 
path  of  the  class  loader,  especially  when  runtime  data  such  as  user  input  is  involved.  This  type  of 
reflection  calls  require  dynamic  analysis  approaches  to  determine  target  information. 

DexClassLoader  loader  =  new 

DexClassLoader (libpath, Dir, null, getloader { ) ) ; 

Class  cl  =Class . forName (className, true, loader ) ; 

Object  obj  =  cl . newinstance {) ; 

Method  m  =  cl . getDeclaredMethod (methodName) ; 
m. invoke (ob j ) ; 


Figure  12:  Reflection  Type  2(b) 

We  ran  our  reflection  analysis  on  every  application  in  those  three  groups,  calculate  the  number  of 
each  reflection  type.  The  results  for  the  AMGP  malware  and  newer  malware  are  shown  in  Table [T4| 
As  we  can  see,  78.7%  of  reflection  calls  in  AMGP  malware  belongs  to  type  1(a);  static  analysis 
should  be  able  to  effectively  determine  reflection  targets.  We  also  find  that  20%  of  reflection  calls 
belong  to  2(a),  which  means  that  these  reflection  calls  use  string  variables  to  specify  invocation 
targets.  This  type  of  reflection  calls  require  dynamic  analysis  approach  to  solve  them. 

On  the  other  hand,  in  the  newer  malware  group,  there  are  20%  reflection  calls  belong  to  1(a), 
and  76%  belongs  to  2(a).  In  both  malware  groups,  there  are  not  many  cases  of  1(b)  and  2(b)  where 
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Table  14.  Reflection  Usage  in  Malware  Samples 


Malware  sample 
group 

AMGP 

Newer  Malware 

Total  number  of 

APKs 

1258 

378 

Year 

2010-2011 

2012-2014 

Refleetion 

Classification 

Number  of 
reflection  calls 

Percentage 

Number  of 
refleetion  ealls 

Percentage 

1(a) 

6357 

78.7% 

946 

20% 

1(b) 

2 

0.02% 

7 

0.1% 

2(a) 

1664 

20.6% 

3613 

76.2% 

2(b) 

56 

0.7% 

176 

3.7% 

custom  class  loader  is  involved.  However,  we  ean  observe  the  trend  of  inereasingly  percentage  of 
type  2(a)  refleetion  ealls. 

The  results  of  Play  Store  Apps  are  shown  In  Table  [TSj 


Table  15.  Reflection  Usage  in  Play  Store  Apps 


Total  number  of  APKs 

126 

Year 

2014 

Refleetion 

Classifieation 

Number  of 
reflection  calls 

Percentage 

1(a) 

1774 

53.9% 

1(b) 

31 

0.9% 

2(a) 

1315 

40% 

2(b) 

170 

5.2% 

In  these  126  top  downloaded  Play  Store  Apps,  we  can  see  a  similarity  to  the  malware  sample 
groups:  Refleetion  types  1(a)  and  2(a)  have  the  highest  percentages  out  of  the  four  categories. 
However,  we  do  notice  that  in  Play  Store  Apps,  more  type  2(b)  refleetion  ealls  are  used  than  the 
malware  sample  groups. 

Considering  the  fact  that  the  number  of  reflection  calls  is  also  related  to  the  number  of  samples 
within  eaeh  group,  we  decide  to  ealeulate  the  refleetion  density  of  eaeh  sample  group,  in  order  to 
observe  the  trend  of  refleetion  usage  in  these  samples.  First,  we  calculate  the  reflection  density  per 
elass,  as  shown  in  Table [T^ below: 

As  shown.  Play  Store  Apps  have  the  highest  number  of  elasses  and  highest  number  of  classes 
that  contain  reflection  calls.  However,  the  newer  malware  group  has  the  highest  refleetion  density 
per  elass. 

The  reflection  density  per  method  is  shown  in  Table  Similar  to  the  refleetion  density  per 
class,  the  Play  Store  Apps  have  the  highest  number  of  methods  and  methods  containing  reflection 
calls.  However  the  newer  malware  group  again  has  the  highest  refleetion  density  per  method. 

In  summary,  while  the  pereentages  of  of  reflection  usage  remain  relatively  flat  among  the  three 
groups  of  applications  (1.01%  to  1.67%),  we  see  that  modem  Android  applieations  (newer  malware 
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Table  16.  Reflection  Density  per  Class 


Sample  Collection 

#  of  classes 

#  of  classes 
with  reflection 
calls 

Density 
(per  class) 

AMGP 

299,126 

3,015 

1.01% 

newer  malware 

118,267 

1,975 

1.67% 

Play  Store  Apps 

372,985 

4,317 

1.16% 

Table  17.  Reflection  Density  per  Method 


Sample  Collection 

#  of  methods 

#  of  methods 
with  reflection 
calls 

Density 
(per  method) 

AMGP 

1,718,380 

4,494 

0.26% 

newer  malware 

839,109 

3,248 

0.39% 

Play  Store  Apps 

2,364,347 

7,423 

0.31% 

and  Play  Store  apps)  are  increasingly  using  more  of  Type  2(a)  reflection  calls,  in  which  targets 
cannot  be  determined  through  static  analysis  alone.  By  being  able  to  capture  runtime  information, 
JiTANA  can  be  used  to  identify  these  reflection  targets.  APEx  can  also  be  used  to  generate  event 
sequences  that  can  reach  these  reflection  calls. 

Recently,  we  have  also  seen  more  usage  of  Type  2(b)  in  modern  apps.  Furthermore,  a  recent 
technique  for  removing  reflection  code  after  each  run  (e.g.,  a  commonly  used  mechanism  to  serve 
advertisements  [|T3|)  can  also  be  used  to  deliver  malicious  code  and  then  delete  it  after  it  has  been 
executed.  To  prevent  later  retrieval  by  analysts,  attackers  can  also  move  the  code  from  an  original 
downloading  site  after  it  has  been  downloaded.  By  working  closely  with  Dalvik,  we  are  able  to 
extend  the  mechanism  used  in  TraVis  to  cache  dynamically  loaded  classes  for  analysis.  This 
feature  is  critical  for  security  analysts  who  need  to  detect  malicious  payload  that  may  be  hiding  as 
reflective  code  and  software  updates. 


6.0  FUTURE  WORK 


We  have  been  developing  Jitana  and  APEx  over  the  past  two  years.  We  plan  to  officially 
release  the  source  code  and  binaries  of  both  frameworks  under  a  BSD  license  in  July,  2016.  At 
that  time,  we  will  also  include  additional  tools  that  we  are  currently  developing.  Next,  we  discuss 
on-going  efforts  to  produce  additional  tools. 

As  shown  in  our  BYOD  use-case,  when  a  large  number  of  apps  are  used,  JiTANA  can  experience 
out-of-memory  errors  when  run  on  a  desktop  or  laptop.  Because  our  approach  is  based  on  BGE, 
we  are  developing  approaches  for  partitioning  the  processes  used  to  generate  graphs  and  perform 
analysis  so  that  they  can  be  performed  in  parallel  on  high-performance  computing  clusters.  On  the 
other  hand,  because  Jitana  incurs  low  overhead  when  used  to  analyze  small  numbers  of  apps,  we 
plan  to  create  a  version  that  can  run  directly  on  an  Android  device  to  perform  real-time  analysis  as 
an  app  is  downloaded  and  then  perform  light-weight  analysis  and  monitoring  as  apps  run. 

Currently,  techniques  such  as  TamiFlex  and  Harvester  ||5  25 1  can  detect  reflective  code  by 
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instrumenting  apps  to  report  refleetion  targets  and  then  eapturing  them  off-line.  However,  a  reeent 
teehnique  for  removing  refleetion  eode  after  eaeh  run  (e.g.,  a  eommonly  used  meehanism  to  serve 
advertisements  [|T3|)  ean  also  be  used  to  deliver  malieious  eode  and  then  delete  it  after  it  has  been 
exeeuted.  To  prevent  later  retrieval  by  analysts,  attaekers  ean  also  move  the  eode  from  an  original 
downloading  site  after  it  has  been  downloaded.  By  working  elosely  with  Dalvik,  we  are  able  to 
extend  the  meehanism  used  in  TraVis  to  eaehe  dynamieally  loaded  elasses  for  analysis.  This 
feature  is  eritieal  for  seeurity  analysts  who  need  to  deteet  malieious  payload  that  may  be  hiding  as 
refleetive  eode  and  software  updates. 

We  have  began  an  effort  to  integrate  APEx  with  Jitana.  We  eurrently  have  a  basie  implemen¬ 
tation  of  a  symbolie  exeeution  engine  in  JiTANA  that  ean  work  on  elusters  and  we  are  refining 
the  implementation  to  take  advantage  of  existing  eonstraint  optimization  frameworks  to  reduee 
runtime  overhead.  We  are  also  developing  a  taint  analysis  engine  for  JiTANA.  Finally,  as  Google 
has  already  shifted  from  Dalvik  VM  to  Android  Run  Time  {ART),  one  of  our  priorities  is  to  make 
Jitana  work  with  ART.  We  have  already  analyzed  the  strueture  of  ART  and  have  determined  how 
to  eapture  runtime  information  that  ean  be  used  by  JiTANA  to  perform  dynamie  analysis.  We  are 
also  working  to  extend  JiTANA  to  support  analysis  of  binary  eomponents  that  ean  interfaee  with 
Android  applieations  through  Java  Native  Interfaees  (JNIs). 

7.0  CONCLUSION 

We  have  made  three  eontributions  that  advanee  the  state-of-the-art  in  program  analysis.  First,  by 
harnessing  the  power  of  generie  programming  and  exploiting  runtime  events  and  information,  we 
have  built  a  highly  effleient  hybrid  program  analysis  framework  that  is  eapable  of  expanding  the 
analysis  seope  to  eover  most  apps  installed  on  an  Android  deviee.  We  provide  eommon  analysis 
building  bloeks  that  inelude  eontrol-flow,  data-flow,  and  points-to  analysis  engines.  Our  evaluation 
results  and  use  eases  show  that  Jitana  ean  analyze  more  elasses  (e.g.,  system  related  and  dynami¬ 
eally  generated  elasses)  in  mueh  less  time  than  SoOT.  We  also  diseussed  on-going  work  to  further 
extend  the  eapabilities  of  JiTANA  that  ineludes  supporting  parallel  analysis  on  elusters,  migration 
to  ART,  and  eaehing  and  analyzing  dynamieally  loaded  eode. 

Seeond,  we  developed  APEx,  a  eoneolie  exeeution  based  event  sequenee  generator  that  pro- 
duees  eomplete  GUI  models  and  identifies  paths  and  eonneetions  to  the  GUI  models  that  ean  be 
used  to  generate  event  sequenees  to  reaeh  speeifie  targets  in  a  program.  We  applied  Jitana  to 
validate  a  hypothesis  that  malieious  eode  exists  in  rarely  exeeuted  paths.  We  found  that  in  many 
engagement  apps,  half  of  the  malieious  loeations  are  hard  to  reaeh  using  random  and  unit  testing. 
We  then  used  APEx  to  generate  event  sequenees  to  reaeh  those  targets.  Unfortunately,  these  hard 
to  reaeh  targets  often  involve  ealls  to  libraries  and  system  APIs  that  are  not  fully  supported  by  our 
eoneolie  exeeution  engines.  As  sueh,  we  were  not  able  to  generate  sequenees  to  reaeh  targets  in 
many  apps. 

Third,  we  showed  through  live  examples  how  the  proposed  frameworks  ean  be  used  to  ad¬ 
dress  emerging  seeurity  ehallenges.  We  have  shown  that  by  using  the  proposed  frameworks,  event 
sequenees  ean  be  generated  to  exereise  hard-to-reaeh  targets.  Complex  analyses  sueh  as  lAC  de- 
teetion  ean  be  quiekly  developed  and  effeetively  and  eflieiently  performed  to  address  emerging 
seeurity  needs  sueh  as  vetting  deviees  in  BYOD  environments  and  deteeting  malieious  apps  that 
eollude.  It  ean  also  analyze  a  large  number  of  apps  eoneurrently  (it  has  suoeessfully  analyzed  as 
many  as  90  apps  eoneurrently)  and  ean  provide  real-time  feedbaek  to  engineers  and  analysts  so 
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that  they  can  evaluate  the  progress  and  effectiveness  of  an  on-going  analysis.  We  have  also  shown 
that  it  can  efficiently  handle  dynamism  of  modem  programming  language  including  identify  and 
analyze  reflective  methods. 
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8.0  ACRONYMS  AND  GLOSSARY 


ART  Android  Run-Time 

APEx  Android  Path  Explorer 

BYOD  Bring  Your  Own  Deviee 

CEVM  Class  Eoader  Virtual  Machine 

IccTA  Inter-component  communication  Taint  Analysis 

Jitana  Just-In-Time  Analysis 

INI  Java  Native  Interface 

JVM  Java  Virtual  Machine 

VM  Virtual  Machine 
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